___

Copyright Pierian Data For more information, visit us at www.pieriandata.com

Time Series with Pandas Project Exercise¶

For this exercise, answer the questions below given the dataset: https://fred.stlouisfed.org/series/UMTMVS

This dataset is the Value of Manufacturers' Shipments for All Manufacturing Industries.

Import any necessary libraries.

In [42]:

# CODE HERE

In [43]:

import numpy as np
import pandas as pd
%matplotlib inline

Read in the data UMTMVS.csv file from the Data folder

In [44]:

# CODE HERE

In [45]:

df = pd.read_csv('../Data/UMTMVS.csv')

Check the head of the data

In [46]:

# CODE HERE

In [47]:

df.head()

Out[47]:

	DATE	UMTMVS
0	1992-01-01	209438.0
1	1992-02-01	232679.0
2	1992-03-01	249673.0
3	1992-04-01	239666.0
4	1992-05-01	243231.0

Set the DATE column as the index.

In [48]:

# CODE HERE

In [49]:

df = df.set_index('DATE')

In [50]:

df.head()

Out[50]:

	UMTMVS
DATE
1992-01-01	209438.0
1992-02-01	232679.0
1992-03-01	249673.0
1992-04-01	239666.0
1992-05-01	243231.0

Check the data type of the index.

In [51]:

# CODE HERE

In [52]:

df.index

Out[52]:

Index(['1992-01-01', '1992-02-01', '1992-03-01', '1992-04-01', '1992-05-01',
       '1992-06-01', '1992-07-01', '1992-08-01', '1992-09-01', '1992-10-01',
       ...
       '2018-04-01', '2018-05-01', '2018-06-01', '2018-07-01', '2018-08-01',
       '2018-09-01', '2018-10-01', '2018-11-01', '2018-12-01', '2019-01-01'],
      dtype='object', name='DATE', length=325)

Convert the index to be a datetime index. Note, there are many, many correct ways to do this!

In [53]:

# CODE HERE

In [54]:

df.index = pd.to_datetime(df.index)

In [55]:

df.index

Out[55]:

DatetimeIndex(['1992-01-01', '1992-02-01', '1992-03-01', '1992-04-01',
               '1992-05-01', '1992-06-01', '1992-07-01', '1992-08-01',
               '1992-09-01', '1992-10-01',
               ...
               '2018-04-01', '2018-05-01', '2018-06-01', '2018-07-01',
               '2018-08-01', '2018-09-01', '2018-10-01', '2018-11-01',
               '2018-12-01', '2019-01-01'],
              dtype='datetime64[ns]', name='DATE', length=325, freq=None)

Plot out the data, choose a reasonable figure size

In [56]:

# CODE HERE

In [69]:

df.plot(figsize=(14,8))

Out[69]:

<matplotlib.axes._subplots.AxesSubplot at 0x1d10ba9bcc0>

What was the percent increase in value from Jan 2009 to Jan 2019?

In [71]:

#CODE HERE

In [76]:

100 * (df.loc['2019-01-01'] - df.loc['2009-01-01']) / df.loc['2009-01-01']

Out[76]:

UMTMVS    38.472149
dtype: float64

What was the percent decrease from Jan 2008 to Jan 2009?

In [ ]:

#CODE HERE

In [77]:

100 * (df.loc['2009-01-01'] - df.loc['2008-01-01']) / df.loc['2008-01-01']

Out[77]:

UMTMVS   -22.022775
dtype: float64

What is the month with the least value after 2005? HINT

In [59]:

#CODE HERE

In [61]:

df.loc['2005-01-01':].idxmin()

Out[61]:

UMTMVS   2009-01-01
dtype: datetime64[ns]

What 6 months have the highest value?

In [68]:

# CODE HERE

In [80]:

df.sort_values(by='UMTMVS',ascending=False).head(5)

Out[80]:

	UMTMVS	Yearly Mean
DATE
2018-08-01	529157.0	490453.500000
2018-10-01	527031.0	496482.333333
2018-06-01	525660.0	483611.000000
2018-03-01	518285.0	474351.250000
2018-09-01	516992.0	493075.583333

How many millions of dollars in value was lost in 2008? (Another way of posing this question is what was the value difference between Jan 2008 and Jan 2009)

In [17]:

# CODE HERE

In [18]:

df.loc['2008-01-01'] - df.loc['2009-01-01']

Out[18]:

UMTMVS    95206.0
dtype: float64

Create a bar plot showing the average value in millions of dollars per year

In [19]:

# CODE HERE

In [20]:

df.resample('Y').mean().plot.bar(figsize=(15,8))

Out[20]:

<matplotlib.axes._subplots.AxesSubplot at 0x1d10a074588>

What year had the biggest increase in mean value from the previous year's mean value? (Lots of ways to get this answer!)

HINT for a useful method

In [21]:

# CODE HERE

In [22]:

yearly_data = df.resample('Y').mean()
yearly_data_shift = yearly_data.shift(1)

In [23]:

yearly_data.head()

Out[23]:

	UMTMVS
DATE
1992-12-31	242002.000000
1993-12-31	251708.083333
1994-12-31	269842.666667
1995-12-31	289973.083333
1996-12-31	299765.666667

In [24]:

change = yearly_data - yearly_data_shift

In [25]:

change['UMTMVS'].idxmax()

Out[25]:

Timestamp('2011-12-31 00:00:00', freq='A-DEC')

Plot out the yearly rolling mean on top of the original data. Recall that this is monthly data and there are 12 months in a year!

In [26]:

# CODE HERE

In [78]:

df['Yearly Mean'] = df['UMTMVS'].rolling(window=12).mean()
df[['UMTMVS','Yearly Mean']].plot(figsize=(12,5)).autoscale(axis='x',tight=True);

BONUS QUESTION (HARD).

Some month in 2008 the value peaked for that year. How many months did it take to surpass that 2008 peak? (Since it crashed immediately after this peak) There are many ways to get this answer. NOTE: I get 70 months as my answer, you may get 69 or 68, depending on whether or not you count the start and end months. Refer to the video solutions for full explanation on this.

In [91]:

#CODE HERE

In [97]:

df = pd.read_csv('../Data/UMTMVS.csv',index_col='DATE',parse_dates=True)

In [98]:

df.head()

Out[98]:

	UMTMVS
DATE
1992-01-01	209438.0
1992-02-01	232679.0
1992-03-01	249673.0
1992-04-01	239666.0
1992-05-01	243231.0

In [99]:

df2008 = df.loc['2008-01-01':'2009-01-01']

In [100]:

df2008.idxmax()

Out[100]:

UMTMVS   2008-06-01
dtype: datetime64[ns]

In [101]:

df2008.max()

Out[101]:

UMTMVS    510081.0
dtype: float64

In [105]:

df_post_peak = df.loc['2008-06-01':]

In [106]:

df_post_peak[df_post_peak>=510081].dropna()

Out[106]:

	UMTMVS
DATE
2008-06-01	510081.0
2014-03-01	513700.0
2014-06-01	516935.0
2014-09-01	512988.0
2018-03-01	518285.0
2018-05-01	515105.0
2018-06-01	525660.0
2018-08-01	529157.0
2018-09-01	516992.0
2018-10-01	527031.0

In [108]:

len(df.loc['2008-06-01':'2014-03-01'])

Out[108]:

202 KiB

Raw Blame History

Time Series with Pandas Project Exercise¶

GREAT JOB!¶

202 KiB Raw Blame History

Time Series with Pandas Project Exercise¶

GREAT JOB!¶

202 KiB

Raw Blame History