66 KiB
Time Resampling¶
Let's learn how to sample time series data! This will be useful later on in the course!
import pandas as pd
%matplotlib inline
Import the data¶
For this exercise we'll look at Starbucks stock data from 2015 to 2018 which includes daily closing prices and trading volumes.
df = pd.read_csv('../Data/starbucks.csv', index_col='Date', parse_dates=True)
Note: the above code is a faster way of doing the following:
df = pd.read_csv('../Data/starbucks.csv') df['Date'] = pd.to_datetime(df['Date']) df.set_index('Date',inplace=True)
df.head()
# Our index
df.index
When calling .resample()
you first need to pass in a rule parameter, then you need to call some sort of aggregation function.
The rule parameter describes the frequency with which to apply the aggregation function (daily, monthly, yearly, etc.)
It is passed in using an "offset alias" - refer to the table below. [reference]
The aggregation function is needed because, due to resampling, we need some sort of mathematical rule to join the rows (mean, sum, count, etc.)
ALIAS | DESCRIPTION |
---|---|
B | business day frequency |
C | custom business day frequency (experimental) |
D | calendar day frequency |
W | weekly frequency |
M | month end frequency |
SM | semi-month end frequency (15th and end of month) |
BM | business month end frequency |
CBM | custom business month end frequency |
MS | month start frequency |
SMS | semi-month start frequency (1st and 15th) |
BMS | business month start frequency |
CBMS | custom business month start frequency |
Q | quarter end frequency |
intentionally left blank |
ALIAS | DESCRIPTION |
---|---|
BQ | business quarter endfrequency |
QS | quarter start frequency |
BQS | business quarter start frequency |
A | year end frequency |
BA | business year end frequency |
AS | year start frequency |
BAS | business year start frequency |
BH | business hour frequency |
H | hourly frequency |
T, min | minutely frequency |
S | secondly frequency |
L, ms | milliseconds |
U, us | microseconds |
N | nanoseconds |
# Yearly Means
df.resample(rule='A').mean()
Resampling rule 'A' takes all of the data points in a given year, applies the aggregation function (in this case we calculate the mean), and reports the result as the last day of that year.
Custom Resampling Functions¶
We're not limited to pandas built-in summary functions (min/max/mean etc.). We can define our own function:
def first_day(entry):
"""
Returns the first instance of the period, regardless of sampling rate.
"""
if len(entry): # handles the case of missing data
return entry[0]
df.resample(rule='A').apply(first_day)
Plotting¶
df['Close'].resample('A').mean().plot.bar(title='Yearly Mean Closing Price for Starbucks');
Pandas treats each sample as its own trace, and by default assigns different colors to each one. If you want, you can pass a color argument to assign your own color collection, or to set a uniform color. For example, color='#1f77b4' sets a uniform "steel blue" color.
Also, the above code can be broken into two lines for improved readability.
title = 'Yearly Mean Closing Price for Starbucks'
df['Close'].resample('A').mean().plot.bar(title=title,color=['#1f77b4']);
title = 'Monthly Max Closing Price for Starbucks'
df['Close'].resample('M').max().plot.bar(figsize=(16,6), title=title,color='#1f77b4');
That is it! Up next we'll learn about time shifts!