{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"___\n",
"\n",
" \n",
"___\n",
"
Copyright Pierian Data \n",
"For more information, visit us at www.pieriandata.com "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Pandas Time Series Exercise Set #1 - Solution\n",
"\n",
"For this set of exercises we'll use a dataset containing monthly milk production values in pounds per cow from January 1962 to December 1975.\n",
"\n",
"IMPORTANT NOTE! Make sure you don't run the cells directly above the example output shown, otherwise you will end up writing over the example output!
"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"168\n",
" Date Production\n",
"0 1962-01 589\n",
"1 1962-02 561\n",
"2 1962-03 640\n",
"3 1962-04 656\n",
"4 1962-05 727\n"
]
}
],
"source": [
"# RUN THIS CELL\n",
"import pandas as pd\n",
"%matplotlib inline\n",
"\n",
"df = pd.read_csv('../Data/monthly_milk_production.csv', encoding='utf8')\n",
"title = \"Monthly milk production: pounds per cow. Jan '62 - Dec '75\"\n",
"\n",
"print(len(df))\n",
"print(df.head())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So df has 168 records and 2 columns."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1. What is the current data type of the Date column?\n",
"HINT: We show how to list column dtypes in the first set of DataFrame lectures."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# CODE HERE\n"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Date object\n",
"Production int64\n",
"dtype: object"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# DON'T WRITE HERE\n",
"df.dtypes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2. Change the Date column to a datetime format"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"Date datetime64[ns]\n",
"Production int64\n",
"dtype: object"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# DON'T WRITE HERE\n",
"df['Date']=pd.to_datetime(df['Date'])\n",
"df.dtypes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3. Set the Date column to be the new index"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" Production \n",
" \n",
" \n",
" Date \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" 1962-01-01 \n",
" 589 \n",
" \n",
" \n",
" 1962-02-01 \n",
" 561 \n",
" \n",
" \n",
" 1962-03-01 \n",
" 640 \n",
" \n",
" \n",
" 1962-04-01 \n",
" 656 \n",
" \n",
" \n",
" 1962-05-01 \n",
" 727 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Production\n",
"Date \n",
"1962-01-01 589\n",
"1962-02-01 561\n",
"1962-03-01 640\n",
"1962-04-01 656\n",
"1962-05-01 727"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# DON'T WRITE HERE\n",
"df.set_index('Date',inplace=True)\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4. Plot the DataFrame with a simple line plot. What do you notice about the plot?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# DON'T WRITE HERE\n",
"df.plot();\n",
"\n",
"# THE PLOT SHOWS CONSISTENT SEASONALITY, AS WELL AS AN UPWARD TREND"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5. Add a column called 'Month' that takes the month value from the index\n",
"HINT: You have to call df.index as df['Date'] won't work.\n",
"\n",
"BONUS: See if you can obtain the name of the month instead of a number! "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" Production \n",
" Month \n",
" \n",
" \n",
" Date \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" 1962-01-01 \n",
" 589 \n",
" 1 \n",
" \n",
" \n",
" 1962-02-01 \n",
" 561 \n",
" 2 \n",
" \n",
" \n",
" 1962-03-01 \n",
" 640 \n",
" 3 \n",
" \n",
" \n",
" 1962-04-01 \n",
" 656 \n",
" 4 \n",
" \n",
" \n",
" 1962-05-01 \n",
" 727 \n",
" 5 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Production Month\n",
"Date \n",
"1962-01-01 589 1\n",
"1962-02-01 561 2\n",
"1962-03-01 640 3\n",
"1962-04-01 656 4\n",
"1962-05-01 727 5"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# DON'T WRITE HERE\n",
"df['Month']=df.index.month\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" Production \n",
" Month \n",
" \n",
" \n",
" Date \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" 1962-01-01 \n",
" 589 \n",
" January \n",
" \n",
" \n",
" 1962-02-01 \n",
" 561 \n",
" February \n",
" \n",
" \n",
" 1962-03-01 \n",
" 640 \n",
" March \n",
" \n",
" \n",
" 1962-04-01 \n",
" 656 \n",
" April \n",
" \n",
" \n",
" 1962-05-01 \n",
" 727 \n",
" May \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Production Month\n",
"Date \n",
"1962-01-01 589 January\n",
"1962-02-01 561 February\n",
"1962-03-01 640 March\n",
"1962-04-01 656 April\n",
"1962-05-01 727 May"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# BONUS SOLUTION:\n",
"df['Month']=df.index.strftime('%B')\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 6. Create a BoxPlot that groups by the Month field"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# DON'T WRITE HERE\n",
"df.boxplot(by='Month',figsize=(12,5));"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Great Job!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 1
}