{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"___\n",
"\n",
" \n",
"___\n",
"
Copyright Pierian Data \n",
"For more information, visit us at www.pieriandata.com "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Time Resampling\n",
"\n",
"Let's learn how to sample time series data! This will be useful later on in the course!"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import pandas as pd\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Import the data\n",
"For this exercise we'll look at Starbucks stock data from 2015 to 2018 which includes daily closing prices and trading volumes."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"df = pd.read_csv('../Data/starbucks.csv', index_col='Date', parse_dates=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note: the above code is a faster way of doing the following:\n",
"df = pd.read_csv('../Data/starbucks.csv')\n",
"df['Date'] = pd.to_datetime(df['Date'])\n",
"df.set_index('Date',inplace=True) "
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" Close \n",
" Volume \n",
" \n",
" \n",
" Date \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" 2015-01-02 \n",
" 38.0061 \n",
" 6906098 \n",
" \n",
" \n",
" 2015-01-05 \n",
" 37.2781 \n",
" 11623796 \n",
" \n",
" \n",
" 2015-01-06 \n",
" 36.9748 \n",
" 7664340 \n",
" \n",
" \n",
" 2015-01-07 \n",
" 37.8848 \n",
" 9732554 \n",
" \n",
" \n",
" 2015-01-08 \n",
" 38.4961 \n",
" 13170548 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Close Volume\n",
"Date \n",
"2015-01-02 38.0061 6906098\n",
"2015-01-05 37.2781 11623796\n",
"2015-01-06 36.9748 7664340\n",
"2015-01-07 37.8848 9732554\n",
"2015-01-08 38.4961 13170548"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## resample()\n",
"\n",
"A common operation with time series data is resampling based on the time series index. Let's see how to use the resample() method. [[reference](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.resample.html)]"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"DatetimeIndex(['2015-01-02', '2015-01-05', '2015-01-06', '2015-01-07',\n",
" '2015-01-08', '2015-01-09', '2015-01-12', '2015-01-13',\n",
" '2015-01-14', '2015-01-15',\n",
" ...\n",
" '2018-12-17', '2018-12-18', '2018-12-19', '2018-12-20',\n",
" '2018-12-21', '2018-12-24', '2018-12-26', '2018-12-27',\n",
" '2018-12-28', '2018-12-31'],\n",
" dtype='datetime64[ns]', name='Date', length=1006, freq=None)"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Our index\n",
"df.index"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When calling `.resample()` you first need to pass in a **rule** parameter, then you need to call some sort of aggregation function.\n",
"\n",
"The **rule** parameter describes the frequency with which to apply the aggregation function (daily, monthly, yearly, etc.) \n",
"It is passed in using an \"offset alias\" - refer to the table below. [[reference](http://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases)]\n",
"\n",
"The aggregation function is needed because, due to resampling, we need some sort of mathematical rule to join the rows (mean, sum, count, etc.)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
" TIME SERIES OFFSET ALIASES \n",
"ALIAS DESCRIPTION \n",
"B business day frequency \n",
"C custom business day frequency (experimental) \n",
"D calendar day frequency \n",
"W weekly frequency \n",
"M month end frequency \n",
"SM semi-month end frequency (15th and end of month) \n",
"BM business month end frequency \n",
"CBM custom business month end frequency \n",
"MS month start frequency \n",
"SMS semi-month start frequency (1st and 15th) \n",
"BMS business month start frequency \n",
"CBMS custom business month start frequency \n",
"Q quarter end frequency \n",
"intentionally left blank
\n",
"\n",
"\n",
" \n",
"ALIAS DESCRIPTION \n",
"BQ business quarter endfrequency \n",
"QS quarter start frequency \n",
"BQS business quarter start frequency \n",
"A year end frequency \n",
"BA business year end frequency \n",
"AS year start frequency \n",
"BAS business year start frequency \n",
"BH business hour frequency \n",
"H hourly frequency \n",
"T, min minutely frequency \n",
"S secondly frequency \n",
"L, ms milliseconds \n",
"U, us microseconds \n",
"N nanoseconds
"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" Close \n",
" Volume \n",
" \n",
" \n",
" Date \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" 2015-12-31 \n",
" 50.078100 \n",
" 8.649190e+06 \n",
" \n",
" \n",
" 2016-12-31 \n",
" 53.891732 \n",
" 9.300633e+06 \n",
" \n",
" \n",
" 2017-12-31 \n",
" 55.457310 \n",
" 9.296078e+06 \n",
" \n",
" \n",
" 2018-12-31 \n",
" 56.870005 \n",
" 1.122883e+07 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Close Volume\n",
"Date \n",
"2015-12-31 50.078100 8.649190e+06\n",
"2016-12-31 53.891732 9.300633e+06\n",
"2017-12-31 55.457310 9.296078e+06\n",
"2018-12-31 56.870005 1.122883e+07"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Yearly Means\n",
"df.resample(rule='A').mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Resampling rule 'A' takes all of the data points in a given year, applies the aggregation function (in this case we calculate the mean), and reports the result as the last day of that year."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Custom Resampling Functions\n",
"\n",
"We're not limited to pandas built-in summary functions (min/max/mean etc.). We can define our own function:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def first_day(entry):\n",
" \"\"\"\n",
" Returns the first instance of the period, regardless of sampling rate.\n",
" \"\"\"\n",
" if len(entry): # handles the case of missing data\n",
" return entry[0]"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" Close \n",
" Volume \n",
" \n",
" \n",
" Date \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" 2015-12-31 \n",
" 38.0061 \n",
" 6906098 \n",
" \n",
" \n",
" 2016-12-31 \n",
" 55.0780 \n",
" 13521544 \n",
" \n",
" \n",
" 2017-12-31 \n",
" 53.1100 \n",
" 7809307 \n",
" \n",
" \n",
" 2018-12-31 \n",
" 56.3243 \n",
" 7215978 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Close Volume\n",
"Date \n",
"2015-12-31 38.0061 6906098\n",
"2016-12-31 55.0780 13521544\n",
"2017-12-31 53.1100 7809307\n",
"2018-12-31 56.3243 7215978"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.resample(rule='A').apply(first_day)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Plotting"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"df['Close'].resample('A').mean().plot.bar(title='Yearly Mean Closing Price for Starbucks');"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Pandas treats each sample as its own trace, and by default assigns different colors to each one. If you want, you can pass a color argument to assign your own color collection, or to set a uniform color. For example, color='#1f77b4' sets a uniform \"steel blue\" color.\n",
"\n",
"Also, the above code can be broken into two lines for improved readability."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"title = 'Yearly Mean Closing Price for Starbucks'\n",
"df['Close'].resample('A').mean().plot.bar(title=title,color=['#1f77b4']);"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"title = 'Monthly Max Closing Price for Starbucks'\n",
"df['Close'].resample('M').max().plot.bar(figsize=(16,6), title=title,color='#1f77b4');"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That is it! Up next we'll learn about time shifts!"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 1
}