202 KiB
Time Series with Pandas Project Exercise¶
For this exercise, answer the questions below given the dataset: https://fred.stlouisfed.org/series/UMTMVS
This dataset is the Value of Manufacturers' Shipments for All Manufacturing Industries.
Import any necessary libraries.
# CODE HERE
import numpy as np
import pandas as pd
%matplotlib inline
Read in the data UMTMVS.csv file from the Data folder
# CODE HERE
df = pd.read_csv('../Data/UMTMVS.csv')
Check the head of the data
# CODE HERE
df.head()
Set the DATE column as the index.
# CODE HERE
df = df.set_index('DATE')
df.head()
Check the data type of the index.
# CODE HERE
df.index
Convert the index to be a datetime index. Note, there are many, many correct ways to do this!
# CODE HERE
df.index = pd.to_datetime(df.index)
df.index
Plot out the data, choose a reasonable figure size
# CODE HERE
df.plot(figsize=(14,8))
What was the percent increase in value from Jan 2009 to Jan 2019?
#CODE HERE
100 * (df.loc['2019-01-01'] - df.loc['2009-01-01']) / df.loc['2009-01-01']
What was the percent decrease from Jan 2008 to Jan 2009?
#CODE HERE
100 * (df.loc['2009-01-01'] - df.loc['2008-01-01']) / df.loc['2008-01-01']
What is the month with the least value after 2005? HINT
#CODE HERE
df.loc['2005-01-01':].idxmin()
What 6 months have the highest value?
# CODE HERE
df.sort_values(by='UMTMVS',ascending=False).head(5)
How many millions of dollars in value was lost in 2008? (Another way of posing this question is what was the value difference between Jan 2008 and Jan 2009)
# CODE HERE
df.loc['2008-01-01'] - df.loc['2009-01-01']
Create a bar plot showing the average value in millions of dollars per year
# CODE HERE
df.resample('Y').mean().plot.bar(figsize=(15,8))
What year had the biggest increase in mean value from the previous year's mean value? (Lots of ways to get this answer!)
# CODE HERE
yearly_data = df.resample('Y').mean()
yearly_data_shift = yearly_data.shift(1)
yearly_data.head()
change = yearly_data - yearly_data_shift
change['UMTMVS'].idxmax()
Plot out the yearly rolling mean on top of the original data. Recall that this is monthly data and there are 12 months in a year!
# CODE HERE
df['Yearly Mean'] = df['UMTMVS'].rolling(window=12).mean()
df[['UMTMVS','Yearly Mean']].plot(figsize=(12,5)).autoscale(axis='x',tight=True);
BONUS QUESTION (HARD).
Some month in 2008 the value peaked for that year. How many months did it take to surpass that 2008 peak? (Since it crashed immediately after this peak) There are many ways to get this answer. NOTE: I get 70 months as my answer, you may get 69 or 68, depending on whether or not you count the start and end months. Refer to the video solutions for full explanation on this.
#CODE HERE
df = pd.read_csv('../Data/UMTMVS.csv',index_col='DATE',parse_dates=True)
df.head()
df2008 = df.loc['2008-01-01':'2009-01-01']
df2008.idxmax()
df2008.max()
df_post_peak = df.loc['2008-06-01':]
df_post_peak[df_post_peak>=510081].dropna()
len(df.loc['2008-06-01':'2014-03-01'])