16 KiB
Introduction to Time Series with Pandas¶
Most of our data will have a datatime index, so let's learn how to deal with this sort of data with pandas!
Python Datetime Review¶
In the course introduction section we discussed Python datetime objects.
from datetime import datetime
# To illustrate the order of arguments
my_year = 2017
my_month = 1
my_day = 2
my_hour = 13
my_minute = 30
my_second = 15
# January 2nd, 2017
my_date = datetime(my_year,my_month,my_day)
# Defaults to 0:00
my_date
# January 2nd, 2017 at 13:30:15
my_date_time = datetime(my_year,my_month,my_day,my_hour,my_minute,my_second)
my_date_time
You can grab any part of the datetime object you want
my_date.day
my_date_time.hour
NumPy Datetime Arrays¶
We mentioned that NumPy handles dates more efficiently than Python's datetime format.
The NumPy data type is called datetime64 to distinguish it from Python's datetime.
In this section we'll show how to set up datetime arrays in NumPy. These will become useful later on in the course.
For more info on NumPy visit https://docs.scipy.org/doc/numpy-1.15.4/reference/arrays.datetime.html
import numpy as np
# CREATE AN ARRAY FROM THREE DATES
np.array(['2016-03-15', '2017-05-24', '2018-08-09'], dtype='datetime64')
If we want we can pass in a different measurement, such as [h] for hour or [Y] for year.
np.array(['2016-03-15', '2017-05-24', '2018-08-09'], dtype='datetime64[h]')
np.array(['2016-03-15', '2017-05-24', '2018-08-09'], dtype='datetime64[Y]')
NumPy Date Ranges¶
Just as np.arange(start,stop,step) can be used to produce an array of evenly-spaced integers, we can pass a dtype argument to obtain an array of dates. Remember that the stop date is exclusive.
# AN ARRAY OF DATES FROM 6/1/18 TO 6/22/18 SPACED ONE WEEK APART
np.arange('2018-06-01', '2018-06-23', 7, dtype='datetime64[D]')
By omitting the step value we can obtain every value based on the precision.
# AN ARRAY OF DATES FOR EVERY YEAR FROM 1968 TO 1975
np.arange('1968', '1976', dtype='datetime64[Y]')
Pandas Datetime Index¶
We'll usually deal with time series as a datetime index when working with pandas dataframes. Fortunately pandas has a lot of functions and methods to work with time series!
For more on the pandas DatetimeIndex visit https://pandas.pydata.org/pandas-docs/stable/timeseries.html
import pandas as pd
The simplest way to build a DatetimeIndex is with the pd.date_range() method:
# THE WEEK OF JULY 8TH, 2018
idx = pd.date_range('7/8/2018', periods=7, freq='D')
idx
Another way is to convert incoming text with the pd.to_datetime() method:
idx = pd.to_datetime(['Jan 01, 2018','1/2/18','03-Jan-2018',None])
idx
A third way is to pass a list or an array of datetime objects into the pd.DatetimeIndex() method:
# Create a NumPy datetime array
some_dates = np.array(['2016-03-15', '2017-05-24', '2018-08-09'], dtype='datetime64[D]')
some_dates
# Convert to an index
idx = pd.DatetimeIndex(some_dates)
idx
Notice that even though the dates came into pandas with a day-level precision, pandas assigns a nanosecond-level precision with the expectation that we might want this later on.
To set an existing column as the index, use .set_index()
df.set_index('Date',inplace=True)
Pandas Datetime Analysis¶
# Create some random data
data = np.random.randn(3,2)
cols = ['A','B']
print(data)
# Create a DataFrame with our random data, our date index, and our columns
df = pd.DataFrame(data,idx,cols)
df
Now we can perform a typical analysis of our DataFrame
df.index
# Latest Date Value
df.index.max()
# Latest Date Index Location
df.index.argmax()
# Earliest Date Value
df.index.min()
# Earliest Date Index Location
df.index.argmin()