{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "\n", "\n", "___\n", "
Copyright Pierian Data
\n", "
For more information, visit us at www.pieriandata.com
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Time Resampling\n", "\n", "Let's learn how to sample time series data! This will be useful later on in the course!" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Import the data\n", "For this exercise we'll look at Starbucks stock data from 2015 to 2018 which includes daily closing prices and trading volumes." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df = pd.read_csv('../Data/starbucks.csv', index_col='Date', parse_dates=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: the above code is a faster way of doing the following:\n", "
df = pd.read_csv('../Data/starbucks.csv')\n",
    "df['Date'] = pd.to_datetime(df['Date'])\n",
    "df.set_index('Date',inplace=True)
" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CloseVolume
Date
2015-01-0238.00616906098
2015-01-0537.278111623796
2015-01-0636.97487664340
2015-01-0737.88489732554
2015-01-0838.496113170548
\n", "
" ], "text/plain": [ " Close Volume\n", "Date \n", "2015-01-02 38.0061 6906098\n", "2015-01-05 37.2781 11623796\n", "2015-01-06 36.9748 7664340\n", "2015-01-07 37.8848 9732554\n", "2015-01-08 38.4961 13170548" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## resample()\n", "\n", "A common operation with time series data is resampling based on the time series index. Let's see how to use the resample() method. [[reference](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.resample.html)]" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DatetimeIndex(['2015-01-02', '2015-01-05', '2015-01-06', '2015-01-07',\n", " '2015-01-08', '2015-01-09', '2015-01-12', '2015-01-13',\n", " '2015-01-14', '2015-01-15',\n", " ...\n", " '2018-12-17', '2018-12-18', '2018-12-19', '2018-12-20',\n", " '2018-12-21', '2018-12-24', '2018-12-26', '2018-12-27',\n", " '2018-12-28', '2018-12-31'],\n", " dtype='datetime64[ns]', name='Date', length=1006, freq=None)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Our index\n", "df.index" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When calling `.resample()` you first need to pass in a **rule** parameter, then you need to call some sort of aggregation function.\n", "\n", "The **rule** parameter describes the frequency with which to apply the aggregation function (daily, monthly, yearly, etc.)
\n", "It is passed in using an \"offset alias\" - refer to the table below. [[reference](http://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases)]\n", "\n", "The aggregation function is needed because, due to resampling, we need some sort of mathematical rule to join the rows (mean, sum, count, etc.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
TIME SERIES OFFSET ALIASES
ALIASDESCRIPTION
Bbusiness day frequency
Ccustom business day frequency (experimental)
Dcalendar day frequency
Wweekly frequency
Mmonth end frequency
SMsemi-month end frequency (15th and end of month)
BMbusiness month end frequency
CBMcustom business month end frequency
MSmonth start frequency
SMSsemi-month start frequency (1st and 15th)
BMSbusiness month start frequency
CBMScustom business month start frequency
Qquarter end frequency
intentionally left blank
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ALIASDESCRIPTION
BQbusiness quarter endfrequency
QSquarter start frequency
BQSbusiness quarter start frequency
Ayear end frequency
BAbusiness year end frequency
ASyear start frequency
BASbusiness year start frequency
BHbusiness hour frequency
Hhourly frequency
T, minminutely frequency
Ssecondly frequency
L, msmilliseconds
U, usmicroseconds
Nnanoseconds
" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CloseVolume
Date
2015-12-3150.0781008.649190e+06
2016-12-3153.8917329.300633e+06
2017-12-3155.4573109.296078e+06
2018-12-3156.8700051.122883e+07
\n", "
" ], "text/plain": [ " Close Volume\n", "Date \n", "2015-12-31 50.078100 8.649190e+06\n", "2016-12-31 53.891732 9.300633e+06\n", "2017-12-31 55.457310 9.296078e+06\n", "2018-12-31 56.870005 1.122883e+07" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Yearly Means\n", "df.resample(rule='A').mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Resampling rule 'A' takes all of the data points in a given year, applies the aggregation function (in this case we calculate the mean), and reports the result as the last day of that year." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Custom Resampling Functions\n", "\n", "We're not limited to pandas built-in summary functions (min/max/mean etc.). We can define our own function:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def first_day(entry):\n", " \"\"\"\n", " Returns the first instance of the period, regardless of sampling rate.\n", " \"\"\"\n", " if len(entry): # handles the case of missing data\n", " return entry[0]" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CloseVolume
Date
2015-12-3138.00616906098
2016-12-3155.078013521544
2017-12-3153.11007809307
2018-12-3156.32437215978
\n", "
" ], "text/plain": [ " Close Volume\n", "Date \n", "2015-12-31 38.0061 6906098\n", "2016-12-31 55.0780 13521544\n", "2017-12-31 53.1100 7809307\n", "2018-12-31 56.3243 7215978" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.resample(rule='A').apply(first_day)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plotting" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "df['Close'].resample('A').mean().plot.bar(title='Yearly Mean Closing Price for Starbucks');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pandas treats each sample as its own trace, and by default assigns different colors to each one. If you want, you can pass a color argument to assign your own color collection, or to set a uniform color. For example, color='#1f77b4' sets a uniform \"steel blue\" color.\n", "\n", "Also, the above code can be broken into two lines for improved readability." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "title = 'Yearly Mean Closing Price for Starbucks'\n", "df['Close'].resample('A').mean().plot.bar(title=title,color=['#1f77b4']);" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "title = 'Monthly Max Closing Price for Starbucks'\n", "df['Close'].resample('M').max().plot.bar(figsize=(16,6), title=title,color='#1f77b4');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That is it! Up next we'll learn about time shifts!" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 1 }