{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "\n", "\n", "___\n", "
Copyright Pierian Data
\n", "
For more information, visit us at www.pieriandata.com
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction to Time Series with Pandas\n", "\n", "Most of our data will have a datatime index, so let's learn how to deal with this sort of data with pandas!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Python Datetime Review\n", "In the course introduction section we discussed Python datetime objects." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from datetime import datetime" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# To illustrate the order of arguments\n", "my_year = 2017\n", "my_month = 1\n", "my_day = 2\n", "my_hour = 13\n", "my_minute = 30\n", "my_second = 15" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# January 2nd, 2017\n", "my_date = datetime(my_year,my_month,my_day)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "datetime.datetime(2017, 1, 2, 0, 0)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Defaults to 0:00\n", "my_date " ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# January 2nd, 2017 at 13:30:15\n", "my_date_time = datetime(my_year,my_month,my_day,my_hour,my_minute,my_second)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "datetime.datetime(2017, 1, 2, 13, 30, 15)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_date_time" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can grab any part of the datetime object you want" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_date.day" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "13" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_date_time.hour" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## NumPy Datetime Arrays\n", "We mentioned that NumPy handles dates more efficiently than Python's datetime format.
\n", "The NumPy data type is called datetime64 to distinguish it from Python's datetime.\n", "\n", "In this section we'll show how to set up datetime arrays in NumPy. These will become useful later on in the course.
\n", "For more info on NumPy visit https://docs.scipy.org/doc/numpy-1.15.4/reference/arrays.datetime.html" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['2016-03-15', '2017-05-24', '2018-08-09'], dtype='datetime64[D]')" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# CREATE AN ARRAY FROM THREE DATES\n", "np.array(['2016-03-15', '2017-05-24', '2018-08-09'], dtype='datetime64')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
NOTE: We see the dtype listed as 'datetime64[D]'. This tells us that NumPy applied a day-level date precision.
\n", " If we want we can pass in a different measurement, such as [h] for hour or [Y] for year.
" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['2016-03-15T00', '2017-05-24T00', '2018-08-09T00'],\n", " dtype='datetime64[h]')" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.array(['2016-03-15', '2017-05-24', '2018-08-09'], dtype='datetime64[h]')" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['2016', '2017', '2018'], dtype='datetime64[Y]')" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.array(['2016-03-15', '2017-05-24', '2018-08-09'], dtype='datetime64[Y]')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## NumPy Date Ranges\n", "Just as np.arange(start,stop,step) can be used to produce an array of evenly-spaced integers, we can pass a dtype argument to obtain an array of dates. Remember that the stop date is exclusive." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['2018-06-01', '2018-06-08', '2018-06-15', '2018-06-22'],\n", " dtype='datetime64[D]')" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# AN ARRAY OF DATES FROM 6/1/18 TO 6/22/18 SPACED ONE WEEK APART\n", "np.arange('2018-06-01', '2018-06-23', 7, dtype='datetime64[D]')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By omitting the step value we can obtain every value based on the precision." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['1968', '1969', '1970', '1971', '1972', '1973', '1974', '1975'],\n", " dtype='datetime64[Y]')" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# AN ARRAY OF DATES FOR EVERY YEAR FROM 1968 TO 1975\n", "np.arange('1968', '1976', dtype='datetime64[Y]')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Pandas Datetime Index\n", "\n", "We'll usually deal with time series as a datetime index when working with pandas dataframes. Fortunately pandas has a lot of functions and methods to work with time series!
\n", "For more on the pandas DatetimeIndex visit https://pandas.pydata.org/pandas-docs/stable/timeseries.html" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The simplest way to build a DatetimeIndex is with the pd.date_range() method:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DatetimeIndex(['2018-07-08', '2018-07-09', '2018-07-10', '2018-07-11',\n", " '2018-07-12', '2018-07-13', '2018-07-14'],\n", " dtype='datetime64[ns]', freq='D')" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# THE WEEK OF JULY 8TH, 2018\n", "idx = pd.date_range('7/8/2018', periods=7, freq='D')\n", "idx" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
DatetimeIndex Frequencies: When we used pd.date_range() above, we had to pass in a frequency parameter 'D'. This created a series of 7 dates spaced one day apart. We'll cover this topic in depth in upcoming lectures, but for now, a list of time series offset aliases like 'D' can be found here.
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another way is to convert incoming text with the pd.to_datetime() method:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', 'NaT'], dtype='datetime64[ns]', freq=None)" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "idx = pd.to_datetime(['Jan 01, 2018','1/2/18','03-Jan-2018',None])\n", "idx" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A third way is to pass a list or an array of datetime objects into the pd.DatetimeIndex() method:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['2016-03-15', '2017-05-24', '2018-08-09'], dtype='datetime64[D]')" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a NumPy datetime array\n", "some_dates = np.array(['2016-03-15', '2017-05-24', '2018-08-09'], dtype='datetime64[D]')\n", "some_dates" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DatetimeIndex(['2016-03-15', '2017-05-24', '2018-08-09'], dtype='datetime64[ns]', freq=None)" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Convert to an index\n", "idx = pd.DatetimeIndex(some_dates)\n", "idx" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that even though the dates came into pandas with a day-level precision, pandas assigns a nanosecond-level precision with the expectation that we might want this later on.\n", "\n", "To set an existing column as the index, use .set_index()
\n", ">df.set_index('Date',inplace=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Pandas Datetime Analysis" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[-1.64971705 1.07943894]\n", " [ 0.4587492 -0.04201784]\n", " [-1.2793774 -1.85383771]]\n" ] } ], "source": [ "# Create some random data\n", "data = np.random.randn(3,2)\n", "cols = ['A','B']\n", "print(data)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AB
2016-03-15-1.6497171.079439
2017-05-240.458749-0.042018
2018-08-09-1.279377-1.853838
\n", "
" ], "text/plain": [ " A B\n", "2016-03-15 -1.649717 1.079439\n", "2017-05-24 0.458749 -0.042018\n", "2018-08-09 -1.279377 -1.853838" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a DataFrame with our random data, our date index, and our columns\n", "df = pd.DataFrame(data,idx,cols)\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can perform a typical analysis of our DataFrame" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DatetimeIndex(['2016-03-15', '2017-05-24', '2018-08-09'], dtype='datetime64[ns]', freq=None)" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.index" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Timestamp('2018-08-09 00:00:00')" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Latest Date Value\n", "df.index.max()" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Latest Date Index Location\n", "df.index.argmax()" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Timestamp('2016-03-15 00:00:00')" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Earliest Date Value\n", "df.index.min()" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Earliest Date Index Location\n", "df.index.argmin()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
NOTE: Normally we would find index locations by running .idxmin() or .idxmax() on df['column'] since .argmin() and .argmax() have been deprecated. However, we still use .argmin() and .argmax() on the index itself.
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Great, let's move on!" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 1 }