You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

2762 lines
91 KiB

2 years ago
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"___\n",
"\n",
"<a href='http://www.pieriandata.com'><img src='../Pierian_Data_Logo.png'/></a>\n",
"___\n",
"<center><em>Copyright by Pierian Data Inc.</em></center>\n",
"<center><em>For more information, visit us at <a href='http://www.pieriandata.com'>www.pieriandata.com</a></em></center>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Time Methods"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Python Datetime Review\n",
"\n",
"Basic Python outside of Pandas contains a datetime library:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from datetime import datetime"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# To illustrate the order of arguments\n",
"my_year = 2017\n",
"my_month = 1\n",
"my_day = 2\n",
"my_hour = 13\n",
"my_minute = 30\n",
"my_second = 15"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# January 2nd, 2017\n",
"my_date = datetime(my_year,my_month,my_day)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"datetime.datetime(2017, 1, 2, 0, 0)"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Defaults to 0:00\n",
"my_date "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# January 2nd, 2017 at 13:30:15\n",
"my_date_time = datetime(my_year,my_month,my_day,my_hour,my_minute,my_second)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"datetime.datetime(2017, 1, 2, 13, 30, 15)"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"my_date_time"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can grab any part of the datetime object you want"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"my_date.day"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"13"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"my_date_time.hour"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"# Pandas\n",
"\n",
"# Converting to datetime\n",
"\n",
"Often when data sets are stored, the time component may be a string. Pandas easily converts strings to datetime objects."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"myser = pd.Series(['Nov 3, 2000', '2000-01-01', None])"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 Nov 3, 2000\n",
"1 2000-01-01\n",
"2 None\n",
"dtype: object"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"myser"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Nov 3, 2000'"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"myser[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### pd.to_datetime()\n",
"\n",
"https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#converting-to-timestamps"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 2000-11-03\n",
"1 2000-01-01\n",
"2 NaT\n",
"dtype: datetime64[ns]"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.to_datetime(myser)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Timestamp('2000-11-03 00:00:00')"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.to_datetime(myser)[0]"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"obvi_euro_date = '31-12-2000'"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Timestamp('2000-12-31 00:00:00')"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.to_datetime(obvi_euro_date) "
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# 10th of Dec OR 12th of October?\n",
"# We may need to tell pandas\n",
"euro_date = '10-12-2000'"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Timestamp('2000-10-12 00:00:00')"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.to_datetime(euro_date) "
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Timestamp('2000-12-10 00:00:00')"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.to_datetime(euro_date,dayfirst=True) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Custom Time String Formatting\n",
"\n",
"Sometimes dates can have a non standard format, luckily you can always specify to pandas the format. You should also note this could speed up the conversion, so it may be worth doing even if pandas can parse on its own."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A full table of codes can be found here: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"style_date = '12--Dec--2000'"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Timestamp('2000-12-12 00:00:00')"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.to_datetime(style_date, format='%d--%b--%Y')"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"strange_date = '12th of Dec 2000'"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Timestamp('2000-12-12 00:00:00')"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.to_datetime(strange_date)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data\n",
"\n",
"Retail Sales: Beer, Wine, and Liquor Stores\n",
"\n",
"Units: Millions of Dollars, Not Seasonally Adjusted\n",
"\n",
"Frequency: Monthly\n",
"\n",
"\n",
"U.S. Census Bureau, Retail Sales: Beer, Wine, and Liquor Stores [MRTSSM4453USN], retrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/MRTSSM4453USN, July 2, 2020."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"sales = pd.read_csv('RetailSales_BeerWineLiquor.csv')"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>DATE</th>\n",
" <th>MRTSSM4453USN</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1992-01-01</td>\n",
" <td>1509</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1992-02-01</td>\n",
" <td>1541</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1992-03-01</td>\n",
" <td>1597</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1992-04-01</td>\n",
" <td>1675</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1992-05-01</td>\n",
" <td>1822</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>335</th>\n",
" <td>2019-12-01</td>\n",
" <td>6630</td>\n",
" </tr>\n",
" <tr>\n",
" <th>336</th>\n",
" <td>2020-01-01</td>\n",
" <td>4388</td>\n",
" </tr>\n",
" <tr>\n",
" <th>337</th>\n",
" <td>2020-02-01</td>\n",
" <td>4533</td>\n",
" </tr>\n",
" <tr>\n",
" <th>338</th>\n",
" <td>2020-03-01</td>\n",
" <td>5562</td>\n",
" </tr>\n",
" <tr>\n",
" <th>339</th>\n",
" <td>2020-04-01</td>\n",
" <td>5207</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>340 rows × 2 columns</p>\n",
"</div>"
],
"text/plain": [
" DATE MRTSSM4453USN\n",
"0 1992-01-01 1509\n",
"1 1992-02-01 1541\n",
"2 1992-03-01 1597\n",
"3 1992-04-01 1675\n",
"4 1992-05-01 1822\n",
".. ... ...\n",
"335 2019-12-01 6630\n",
"336 2020-01-01 4388\n",
"337 2020-02-01 4533\n",
"338 2020-03-01 5562\n",
"339 2020-04-01 5207\n",
"\n",
"[340 rows x 2 columns]"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sales"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'1992-01-01'"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sales.iloc[0]['DATE']"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"str"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(sales.iloc[0]['DATE'])"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"sales['DATE'] = pd.to_datetime(sales['DATE'])"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>DATE</th>\n",
" <th>MRTSSM4453USN</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1992-01-01</td>\n",
" <td>1509</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1992-02-01</td>\n",
" <td>1541</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1992-03-01</td>\n",
" <td>1597</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1992-04-01</td>\n",
" <td>1675</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1992-05-01</td>\n",
" <td>1822</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>335</th>\n",
" <td>2019-12-01</td>\n",
" <td>6630</td>\n",
" </tr>\n",
" <tr>\n",
" <th>336</th>\n",
" <td>2020-01-01</td>\n",
" <td>4388</td>\n",
" </tr>\n",
" <tr>\n",
" <th>337</th>\n",
" <td>2020-02-01</td>\n",
" <td>4533</td>\n",
" </tr>\n",
" <tr>\n",
" <th>338</th>\n",
" <td>2020-03-01</td>\n",
" <td>5562</td>\n",
" </tr>\n",
" <tr>\n",
" <th>339</th>\n",
" <td>2020-04-01</td>\n",
" <td>5207</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>340 rows × 2 columns</p>\n",
"</div>"
],
"text/plain": [
" DATE MRTSSM4453USN\n",
"0 1992-01-01 1509\n",
"1 1992-02-01 1541\n",
"2 1992-03-01 1597\n",
"3 1992-04-01 1675\n",
"4 1992-05-01 1822\n",
".. ... ...\n",
"335 2019-12-01 6630\n",
"336 2020-01-01 4388\n",
"337 2020-02-01 4533\n",
"338 2020-03-01 5562\n",
"339 2020-04-01 5207\n",
"\n",
"[340 rows x 2 columns]"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sales"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Timestamp('1992-01-01 00:00:00')"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sales.iloc[0]['DATE']"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"pandas._libs.tslibs.timestamps.Timestamp"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(sales.iloc[0]['DATE'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"------"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Attempt to Parse Dates Automatically\n",
"\n",
"**parse_dates** - bool or list of int or names or list of lists or dict, default False\n",
"The behavior is as follows:\n",
"\n",
" boolean. If True -> try parsing the index.\n",
"\n",
" list of int or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.\n",
"\n",
" list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column.\n",
"\n",
" dict, e.g. {foo : [1, 3]} -> parse columns 1, 3 as date and call result foo\n",
"\n",
" If a column or index cannot be represented as an array of datetimes, say because of an unparseable value or a mixture of timezones, the column or index will be returned unaltered as an object data type. For non-standard datetime parsing, use pd.to_datetime after pd.read_csv. To parse an index or column with a mixture of timezones, specify date_parser to be a partially-applied pandas.to_datetime() with utc=True. See Parsing a CSV with mixed timezones for more."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Parse Column at Index 0 as Datetime\n",
"sales = pd.read_csv('RetailSales_BeerWineLiquor.csv',parse_dates=[0])"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>DATE</th>\n",
" <th>MRTSSM4453USN</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1992-01-01</td>\n",
" <td>1509</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1992-02-01</td>\n",
" <td>1541</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1992-03-01</td>\n",
" <td>1597</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1992-04-01</td>\n",
" <td>1675</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1992-05-01</td>\n",
" <td>1822</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>335</th>\n",
" <td>2019-12-01</td>\n",
" <td>6630</td>\n",
" </tr>\n",
" <tr>\n",
" <th>336</th>\n",
" <td>2020-01-01</td>\n",
" <td>4388</td>\n",
" </tr>\n",
" <tr>\n",
" <th>337</th>\n",
" <td>2020-02-01</td>\n",
" <td>4533</td>\n",
" </tr>\n",
" <tr>\n",
" <th>338</th>\n",
" <td>2020-03-01</td>\n",
" <td>5562</td>\n",
" </tr>\n",
" <tr>\n",
" <th>339</th>\n",
" <td>2020-04-01</td>\n",
" <td>5207</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>340 rows × 2 columns</p>\n",
"</div>"
],
"text/plain": [
" DATE MRTSSM4453USN\n",
"0 1992-01-01 1509\n",
"1 1992-02-01 1541\n",
"2 1992-03-01 1597\n",
"3 1992-04-01 1675\n",
"4 1992-05-01 1822\n",
".. ... ...\n",
"335 2019-12-01 6630\n",
"336 2020-01-01 4388\n",
"337 2020-02-01 4533\n",
"338 2020-03-01 5562\n",
"339 2020-04-01 5207\n",
"\n",
"[340 rows x 2 columns]"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sales"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"pandas._libs.tslibs.timestamps.Timestamp"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(sales.iloc[0]['DATE'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Resample"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"A common operation with time series data is resampling based on the time series index. Let's see how to use the resample() method. [[reference](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.resample.html)]"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"RangeIndex(start=0, stop=340, step=1)"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Our index\n",
"sales.index"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Reset DATE to index"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"sales = sales.set_index(\"DATE\")"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>MRTSSM4453USN</th>\n",
" </tr>\n",
" <tr>\n",
" <th>DATE</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1992-01-01</th>\n",
" <td>1509</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1992-02-01</th>\n",
" <td>1541</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1992-03-01</th>\n",
" <td>1597</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1992-04-01</th>\n",
" <td>1675</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1992-05-01</th>\n",
" <td>1822</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-12-01</th>\n",
" <td>6630</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-01-01</th>\n",
" <td>4388</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-02-01</th>\n",
" <td>4533</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-03-01</th>\n",
" <td>5562</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-04-01</th>\n",
" <td>5207</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>340 rows × 1 columns</p>\n",
"</div>"
],
"text/plain": [
" MRTSSM4453USN\n",
"DATE \n",
"1992-01-01 1509\n",
"1992-02-01 1541\n",
"1992-03-01 1597\n",
"1992-04-01 1675\n",
"1992-05-01 1822\n",
"... ...\n",
"2019-12-01 6630\n",
"2020-01-01 4388\n",
"2020-02-01 4533\n",
"2020-03-01 5562\n",
"2020-04-01 5207\n",
"\n",
"[340 rows x 1 columns]"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sales"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When calling `.resample()` you first need to pass in a **rule** parameter, then you need to call some sort of aggregation function.\n",
"\n",
"The **rule** parameter describes the frequency with which to apply the aggregation function (daily, monthly, yearly, etc.)<br>\n",
"It is passed in using an \"offset alias\" - refer to the table below. [[reference](http://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases)]\n",
"\n",
"The aggregation function is needed because, due to resampling, we need some sort of mathematical rule to join the rows (mean, sum, count, etc.)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<table style=\"display: inline-block\">\n",
" <caption style=\"text-align: center\"><strong>TIME SERIES OFFSET ALIASES</strong></caption>\n",
"<tr><th>ALIAS</th><th>DESCRIPTION</th></tr>\n",
"<tr><td>B</td><td>business day frequency</td></tr>\n",
"<tr><td>C</td><td>custom business day frequency (experimental)</td></tr>\n",
"<tr><td>D</td><td>calendar day frequency</td></tr>\n",
"<tr><td>W</td><td>weekly frequency</td></tr>\n",
"<tr><td>M</td><td>month end frequency</td></tr>\n",
"<tr><td>SM</td><td>semi-month end frequency (15th and end of month)</td></tr>\n",
"<tr><td>BM</td><td>business month end frequency</td></tr>\n",
"<tr><td>CBM</td><td>custom business month end frequency</td></tr>\n",
"<tr><td>MS</td><td>month start frequency</td></tr>\n",
"<tr><td>SMS</td><td>semi-month start frequency (1st and 15th)</td></tr>\n",
"<tr><td>BMS</td><td>business month start frequency</td></tr>\n",
"<tr><td>CBMS</td><td>custom business month start frequency</td></tr>\n",
"<tr><td>Q</td><td>quarter end frequency</td></tr>\n",
"<tr><td></td><td><font color=white>intentionally left blank</font></td></tr></table>\n",
"\n",
"<table style=\"display: inline-block; margin-left: 40px\">\n",
"<caption style=\"text-align: center\"></caption>\n",
"<tr><th>ALIAS</th><th>DESCRIPTION</th></tr>\n",
"<tr><td>BQ</td><td>business quarter endfrequency</td></tr>\n",
"<tr><td>QS</td><td>quarter start frequency</td></tr>\n",
"<tr><td>BQS</td><td>business quarter start frequency</td></tr>\n",
"<tr><td>A</td><td>year end frequency</td></tr>\n",
"<tr><td>BA</td><td>business year end frequency</td></tr>\n",
"<tr><td>AS</td><td>year start frequency</td></tr>\n",
"<tr><td>BAS</td><td>business year start frequency</td></tr>\n",
"<tr><td>BH</td><td>business hour frequency</td></tr>\n",
"<tr><td>H</td><td>hourly frequency</td></tr>\n",
"<tr><td>T, min</td><td>minutely frequency</td></tr>\n",
"<tr><td>S</td><td>secondly frequency</td></tr>\n",
"<tr><td>L, ms</td><td>milliseconds</td></tr>\n",
"<tr><td>U, us</td><td>microseconds</td></tr>\n",
"<tr><td>N</td><td>nanoseconds</td></tr></table>"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>MRTSSM4453USN</th>\n",
" </tr>\n",
" <tr>\n",
" <th>DATE</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1992-12-31</th>\n",
" <td>1807.250000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1993-12-31</th>\n",
" <td>1794.833333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1994-12-31</th>\n",
" <td>1841.750000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1995-12-31</th>\n",
" <td>1833.916667</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1996-12-31</th>\n",
" <td>1929.750000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1997-12-31</th>\n",
" <td>2006.750000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1998-12-31</th>\n",
" <td>2115.166667</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1999-12-31</th>\n",
" <td>2206.333333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2000-12-31</th>\n",
" <td>2375.583333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2001-12-31</th>\n",
" <td>2468.416667</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2002-12-31</th>\n",
" <td>2491.166667</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2003-12-31</th>\n",
" <td>2539.083333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2004-12-31</th>\n",
" <td>2682.416667</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2005-12-31</th>\n",
" <td>2797.250000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2006-12-31</th>\n",
" <td>3001.333333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2007-12-31</th>\n",
" <td>3177.333333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2008-12-31</th>\n",
" <td>3292.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2009-12-31</th>\n",
" <td>3353.750000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2010-12-31</th>\n",
" <td>3450.083333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2011-12-31</th>\n",
" <td>3532.666667</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2012-12-31</th>\n",
" <td>3697.083333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2013-12-31</th>\n",
" <td>3839.666667</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2014-12-31</th>\n",
" <td>4023.833333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2015-12-31</th>\n",
" <td>4212.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2016-12-31</th>\n",
" <td>4434.416667</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2017-12-31</th>\n",
" <td>4602.666667</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018-12-31</th>\n",
" <td>4830.666667</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-12-31</th>\n",
" <td>4972.750000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020-12-31</th>\n",
" <td>4922.500000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" MRTSSM4453USN\n",
"DATE \n",
"1992-12-31 1807.250000\n",
"1993-12-31 1794.833333\n",
"1994-12-31 1841.750000\n",
"1995-12-31 1833.916667\n",
"1996-12-31 1929.750000\n",
"1997-12-31 2006.750000\n",
"1998-12-31 2115.166667\n",
"1999-12-31 2206.333333\n",
"2000-12-31 2375.583333\n",
"2001-12-31 2468.416667\n",
"2002-12-31 2491.166667\n",
"2003-12-31 2539.083333\n",
"2004-12-31 2682.416667\n",
"2005-12-31 2797.250000\n",
"2006-12-31 3001.333333\n",
"2007-12-31 3177.333333\n",
"2008-12-31 3292.000000\n",
"2009-12-31 3353.750000\n",
"2010-12-31 3450.083333\n",
"2011-12-31 3532.666667\n",
"2012-12-31 3697.083333\n",
"2013-12-31 3839.666667\n",
"2014-12-31 4023.833333\n",
"2015-12-31 4212.500000\n",
"2016-12-31 4434.416667\n",
"2017-12-31 4602.666667\n",
"2018-12-31 4830.666667\n",
"2019-12-31 4972.750000\n",
"2020-12-31 4922.500000"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Yearly Means\n",
"sales.resample(rule='A').mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Resampling rule 'A' takes all of the data points in a given year, applies the aggregation function (in this case we calculate the mean), and reports the result as the last day of that year. Note 2020 in this data set was not complete."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# .dt Method Calls\n",
"\n",
"Once a column or index is ina datetime format, you can call a variety of methods off of the .dt library inside pandas:\n",
"\n",
"https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.html"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [],
"source": [
"sales = sales.reset_index()"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>DATE</th>\n",
" <th>MRTSSM4453USN</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1992-01-01</td>\n",
" <td>1509</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1992-02-01</td>\n",
" <td>1541</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1992-03-01</td>\n",
" <td>1597</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1992-04-01</td>\n",
" <td>1675</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1992-05-01</td>\n",
" <td>1822</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>335</th>\n",
" <td>2019-12-01</td>\n",
" <td>6630</td>\n",
" </tr>\n",
" <tr>\n",
" <th>336</th>\n",
" <td>2020-01-01</td>\n",
" <td>4388</td>\n",
" </tr>\n",
" <tr>\n",
" <th>337</th>\n",
" <td>2020-02-01</td>\n",
" <td>4533</td>\n",
" </tr>\n",
" <tr>\n",
" <th>338</th>\n",
" <td>2020-03-01</td>\n",
" <td>5562</td>\n",
" </tr>\n",
" <tr>\n",
" <th>339</th>\n",
" <td>2020-04-01</td>\n",
" <td>5207</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>340 rows × 2 columns</p>\n",
"</div>"
],
"text/plain": [
" DATE MRTSSM4453USN\n",
"0 1992-01-01 1509\n",
"1 1992-02-01 1541\n",
"2 1992-03-01 1597\n",
"3 1992-04-01 1675\n",
"4 1992-05-01 1822\n",
".. ... ...\n",
"335 2019-12-01 6630\n",
"336 2020-01-01 4388\n",
"337 2020-02-01 4533\n",
"338 2020-03-01 5562\n",
"339 2020-04-01 5207\n",
"\n",
"[340 rows x 2 columns]"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sales"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Help on DatetimeProperties in module pandas.core.indexes.accessors object:\n",
"\n",
"class DatetimeProperties(Properties)\n",
" | Accessor object for datetimelike properties of the Series values.\n",
" | \n",
" | Examples\n",
" | --------\n",
" | >>> s.dt.hour\n",
" | >>> s.dt.second\n",
" | >>> s.dt.quarter\n",
" | \n",
" | Returns a Series indexed like the original Series.\n",
" | Raises TypeError if the Series does not contain datetimelike values.\n",
" | \n",
" | Method resolution order:\n",
" | DatetimeProperties\n",
" | Properties\n",
" | pandas.core.accessor.PandasDelegate\n",
" | pandas.core.base.PandasObject\n",
" | pandas.core.accessor.DirNamesMixin\n",
" | pandas.core.base.NoNewAttributesMixin\n",
" | builtins.object\n",
" | \n",
" | Methods defined here:\n",
" | \n",
" | ceil(self, *args, **kwargs)\n",
" | Perform ceil operation on the data to the specified `freq`.\n",
" | \n",
" | Parameters\n",
" | ----------\n",
" | freq : str or Offset\n",
" | The frequency level to ceil the index to. Must be a fixed\n",
" | frequency like 'S' (second) not 'ME' (month end). See\n",
" | :ref:`frequency aliases <timeseries.offset_aliases>` for\n",
" | a list of possible `freq` values.\n",
" | ambiguous : 'infer', bool-ndarray, 'NaT', default 'raise'\n",
" | Only relevant for DatetimeIndex:\n",
" | \n",
" | - 'infer' will attempt to infer fall dst-transition hours based on\n",
" | order\n",
" | - bool-ndarray where True signifies a DST time, False designates\n",
" | a non-DST time (note that this flag is only applicable for\n",
" | ambiguous times)\n",
" | - 'NaT' will return NaT where there are ambiguous times\n",
" | - 'raise' will raise an AmbiguousTimeError if there are ambiguous\n",
" | times.\n",
" | \n",
" | .. versionadded:: 0.24.0\n",
" | \n",
" | nonexistent : 'shift_forward', 'shift_backward', 'NaT', timedelta, default 'raise'\n",
" | A nonexistent time does not exist in a particular timezone\n",
" | where clocks moved forward due to DST.\n",
" | \n",
" | - 'shift_forward' will shift the nonexistent time forward to the\n",
" | closest existing time\n",
" | - 'shift_backward' will shift the nonexistent time backward to the\n",
" | closest existing time\n",
" | - 'NaT' will return NaT where there are nonexistent times\n",
" | - timedelta objects will shift nonexistent times by the timedelta\n",
" | - 'raise' will raise an NonExistentTimeError if there are\n",
" | nonexistent times.\n",
" | \n",
" | .. versionadded:: 0.24.0\n",
" | \n",
" | Returns\n",
" | -------\n",
" | DatetimeIndex, TimedeltaIndex, or Series\n",
" | Index of the same type for a DatetimeIndex or TimedeltaIndex,\n",
" | or a Series with the same index for a Series.\n",
" | \n",
" | Raises\n",
" | ------\n",
" | ValueError if the `freq` cannot be converted.\n",
" | \n",
" | Examples\n",
" | --------\n",
" | **DatetimeIndex**\n",
" | \n",
" | >>> rng = pd.date_range('1/1/2018 11:59:00', periods=3, freq='min')\n",
" | >>> rng\n",
" | DatetimeIndex(['2018-01-01 11:59:00', '2018-01-01 12:00:00',\n",
" | '2018-01-01 12:01:00'],\n",
" | dtype='datetime64[ns]', freq='T')\n",
" | >>> rng.ceil('H')\n",
" | DatetimeIndex(['2018-01-01 12:00:00', '2018-01-01 12:00:00',\n",
" | '2018-01-01 13:00:00'],\n",
" | dtype='datetime64[ns]', freq=None)\n",
" | \n",
" | **Series**\n",
" | \n",
" | >>> pd.Series(rng).dt.ceil(\"H\")\n",
" | 0 2018-01-01 12:00:00\n",
" | 1 2018-01-01 12:00:00\n",
" | 2 2018-01-01 13:00:00\n",
" | dtype: datetime64[ns]\n",
" | \n",
" | day_name(self, *args, **kwargs)\n",
" | Return the day names of the DateTimeIndex with specified locale.\n",
" | \n",
" | .. versionadded:: 0.23.0\n",
" | \n",
" | Parameters\n",
" | ----------\n",
" | locale : str, optional\n",
" | Locale determining the language in which to return the day name.\n",
" | Default is English locale.\n",
" | \n",
" | Returns\n",
" | -------\n",
" | Index\n",
" | Index of day names.\n",
" | \n",
" | Examples\n",
" | --------\n",
" | >>> idx = pd.date_range(start='2018-01-01', freq='D', periods=3)\n",
" | >>> idx\n",
" | DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03'],\n",
" | dtype='datetime64[ns]', freq='D')\n",
" | >>> idx.day_name()\n",
" | Index(['Monday', 'Tuesday', 'Wednesday'], dtype='object')\n",
" | \n",
" | floor(self, *args, **kwargs)\n",
" | Perform floor operation on the data to the specified `freq`.\n",
" | \n",
" | Parameters\n",
" | ----------\n",
" | freq : str or Offset\n",
" | The frequency level to floor the index to. Must be a fixed\n",
" | frequency like 'S' (second) not 'ME' (month end). See\n",
" | :ref:`frequency aliases <timeseries.offset_aliases>` for\n",
" | a list of possible `freq` values.\n",
" | ambiguous : 'infer', bool-ndarray, 'NaT', default 'raise'\n",
" | Only relevant for DatetimeIndex:\n",
" | \n",
" | - 'infer' will attempt to infer fall dst-transition hours based on\n",
" | order\n",
" | - bool-ndarray where True signifies a DST time, False designates\n",
" | a non-DST time (note that this flag is only applicable for\n",
" | ambiguous times)\n",
" | - 'NaT' will return NaT where there are ambiguous times\n",
" | - 'raise' will raise an AmbiguousTimeError if there are ambiguous\n",
" | times.\n",
" | \n",
" | .. versionadded:: 0.24.0\n",
" | \n",
" | nonexistent : 'shift_forward', 'shift_backward', 'NaT', timedelta, default 'raise'\n",
" | A nonexistent time does not exist in a particular timezone\n",
" | where clocks moved forward due to DST.\n",
" | \n",
" | - 'shift_forward' will shift the nonexistent time forward to the\n",
" | closest existing time\n",
" | - 'shift_backward' will shift the nonexistent time backward to the\n",
" | closest existing time\n",
" | - 'NaT' will return NaT where there are nonexistent times\n",
" | - timedelta objects will shift nonexistent times by the timedelta\n",
" | - 'raise' will raise an NonExistentTimeError if there are\n",
" | nonexistent times.\n",
" | \n",
" | .. versionadded:: 0.24.0\n",
" | \n",
" | Returns\n",
" | -------\n",
" | DatetimeIndex, TimedeltaIndex, or Series\n",
" | Index of the same type for a DatetimeIndex or TimedeltaIndex,\n",
" | or a Series with the same index for a Series.\n",
" | \n",
" | Raises\n",
" | ------\n",
" | ValueError if the `freq` cannot be converted.\n",
" | \n",
" | Examples\n",
" | --------\n",
" | **DatetimeIndex**\n",
" | \n",
" | >>> rng = pd.date_range('1/1/2018 11:59:00', periods=3, freq='min')\n",
" | >>> rng\n",
" | DatetimeIndex(['2018-01-01 11:59:00', '2018-01-01 12:00:00',\n",
" | '2018-01-01 12:01:00'],\n",
" | dtype='datetime64[ns]', freq='T')\n",
" | >>> rng.floor('H')\n",
" | DatetimeIndex(['2018-01-01 11:00:00', '2018-01-01 12:00:00',\n",
" | '2018-01-01 12:00:00'],\n",
" | dtype='datetime64[ns]', freq=None)\n",
" | \n",
" | **Series**\n",
" | \n",
" | >>> pd.Series(rng).dt.floor(\"H\")\n",
" | 0 2018-01-01 11:00:00\n",
" | 1 2018-01-01 12:00:00\n",
" | 2 2018-01-01 12:00:00\n",
" | dtype: datetime64[ns]\n",
" | \n",
" | month_name(self, *args, **kwargs)\n",
" | Return the month names of the DateTimeIndex with specified locale.\n",
" | \n",
" | .. versionadded:: 0.23.0\n",
" | \n",
" | Parameters\n",
" | ----------\n",
" | locale : str, optional\n",
" | Locale determining the language in which to return the month name.\n",
" | Default is English locale.\n",
" | \n",
" | Returns\n",
" | -------\n",
" | Index\n",
" | Index of month names.\n",
" | \n",
" | Examples\n",
" | --------\n",
" | >>> idx = pd.date_range(start='2018-01', freq='M', periods=3)\n",
" | >>> idx\n",
" | DatetimeIndex(['2018-01-31', '2018-02-28', '2018-03-31'],\n",
" | dtype='datetime64[ns]', freq='M')\n",
" | >>> idx.month_name()\n",
" | Index(['January', 'February', 'March'], dtype='object')\n",
" | \n",
" | normalize(self, *args, **kwargs)\n",
" | Convert times to midnight.\n",
" | \n",
" | The time component of the date-time is converted to midnight i.e.\n",
" | 00:00:00. This is useful in cases, when the time does not matter.\n",
" | Length is unaltered. The timezones are unaffected.\n",
" | \n",
" | This method is available on Series with datetime values under\n",
" | the ``.dt`` accessor, and directly on Datetime Array/Index.\n",
" | \n",
" | Returns\n",
" | -------\n",
" | DatetimeArray, DatetimeIndex or Series\n",
" | The same type as the original data. Series will have the same\n",
" | name and index. DatetimeIndex will have the same name.\n",
" | \n",
" | See Also\n",
" | --------\n",
" | floor : Floor the datetimes to the specified freq.\n",
" | ceil : Ceil the datetimes to the specified freq.\n",
" | round : Round the datetimes to the specified freq.\n",
" | \n",
" | Examples\n",
" | --------\n",
" | >>> idx = pd.date_range(start='2014-08-01 10:00', freq='H',\n",
" | ... periods=3, tz='Asia/Calcutta')\n",
" | >>> idx\n",
" | DatetimeIndex(['2014-08-01 10:00:00+05:30',\n",
" | '2014-08-01 11:00:00+05:30',\n",
" | '2014-08-01 12:00:00+05:30'],\n",
" | dtype='datetime64[ns, Asia/Calcutta]', freq='H')\n",
" | >>> idx.normalize()\n",
" | DatetimeIndex(['2014-08-01 00:00:00+05:30',\n",
" | '2014-08-01 00:00:00+05:30',\n",
" | '2014-08-01 00:00:00+05:30'],\n",
" | dtype='datetime64[ns, Asia/Calcutta]', freq=None)\n",
" | \n",
" | round(self, *args, **kwargs)\n",
" | Perform round operation on the data to the specified `freq`.\n",
" | \n",
" | Parameters\n",
" | ----------\n",
" | freq : str or Offset\n",
" | The frequency level to round the index to. Must be a fixed\n",
" | frequency like 'S' (second) not 'ME' (month end). See\n",
" | :ref:`frequency aliases <timeseries.offset_aliases>` for\n",
" | a list of possible `freq` values.\n",
" | ambiguous : 'infer', bool-ndarray, 'NaT', default 'raise'\n",
" | Only relevant for DatetimeIndex:\n",
" | \n",
" | - 'infer' will attempt to infer fall dst-transition hours based on\n",
" | order\n",
" | - bool-ndarray where True signifies a DST time, False designates\n",
" | a non-DST time (note that this flag is only applicable for\n",
" | ambiguous times)\n",
" | - 'NaT' will return NaT where there are ambiguous times\n",
" | - 'raise' will raise an AmbiguousTimeError if there are ambiguous\n",
" | times.\n",
" | \n",
" | .. versionadded:: 0.24.0\n",
" | \n",
" | nonexistent : 'shift_forward', 'shift_backward', 'NaT', timedelta, default 'raise'\n",
" | A nonexistent time does not exist in a particular timezone\n",
" | where clocks moved forward due to DST.\n",
" | \n",
" | - 'shift_forward' will shift the nonexistent time forward to the\n",
" | closest existing time\n",
" | - 'shift_backward' will shift the nonexistent time backward to the\n",
" | closest existing time\n",
" | - 'NaT' will return NaT where there are nonexistent times\n",
" | - timedelta objects will shift nonexistent times by the timedelta\n",
" | - 'raise' will raise an NonExistentTimeError if there are\n",
" | nonexistent times.\n",
" | \n",
" | .. versionadded:: 0.24.0\n",
" | \n",
" | Returns\n",
" | -------\n",
" | DatetimeIndex, TimedeltaIndex, or Series\n",
" | Index of the same type for a DatetimeIndex or TimedeltaIndex,\n",
" | or a Series with the same index for a Series.\n",
" | \n",
" | Raises\n",
" | ------\n",
" | ValueError if the `freq` cannot be converted.\n",
" | \n",
" | Examples\n",
" | --------\n",
" | **DatetimeIndex**\n",
" | \n",
" | >>> rng = pd.date_range('1/1/2018 11:59:00', periods=3, freq='min')\n",
" | >>> rng\n",
" | DatetimeIndex(['2018-01-01 11:59:00', '2018-01-01 12:00:00',\n",
" | '2018-01-01 12:01:00'],\n",
" | dtype='datetime64[ns]', freq='T')\n",
" | >>> rng.round('H')\n",
" | DatetimeIndex(['2018-01-01 12:00:00', '2018-01-01 12:00:00',\n",
" | '2018-01-01 12:00:00'],\n",
" | dtype='datetime64[ns]', freq=None)\n",
" | \n",
" | **Series**\n",
" | \n",
" | >>> pd.Series(rng).dt.round(\"H\")\n",
" | 0 2018-01-01 12:00:00\n",
" | 1 2018-01-01 12:00:00\n",
" | 2 2018-01-01 12:00:00\n",
" | dtype: datetime64[ns]\n",
" | \n",
" | strftime(self, *args, **kwargs)\n",
" | Convert to Index using specified date_format.\n",
" | \n",
" | Return an Index of formatted strings specified by date_format, which\n",
" | supports the same string format as the python standard library. Details\n",
" | of the string format can be found in `python string format\n",
" | doc <https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior>`__.\n",
" | \n",
" | Parameters\n",
" | ----------\n",
" | date_format : str\n",
" | Date format string (e.g. \"%Y-%m-%d\").\n",
" | \n",
" | Returns\n",
" | -------\n",
" | ndarray\n",
" | NumPy ndarray of formatted strings.\n",
" | \n",
" | See Also\n",
" | --------\n",
" | to_datetime : Convert the given argument to datetime.\n",
" | DatetimeIndex.normalize : Return DatetimeIndex with times to midnight.\n",
" | DatetimeIndex.round : Round the DatetimeIndex to the specified freq.\n",
" | DatetimeIndex.floor : Floor the DatetimeIndex to the specified freq.\n",
" | \n",
" | Examples\n",
" | --------\n",
" | >>> rng = pd.date_range(pd.Timestamp(\"2018-03-10 09:00\"),\n",
" | ... periods=3, freq='s')\n",
" | >>> rng.strftime('%B %d, %Y, %r')\n",
" | Index(['March 10, 2018, 09:00:00 AM', 'March 10, 2018, 09:00:01 AM',\n",
" | 'March 10, 2018, 09:00:02 AM'],\n",
" | dtype='object')\n",
" | \n",
" | to_period(self, *args, **kwargs)\n",
" | Cast to PeriodArray/Index at a particular frequency.\n",
" | \n",
" | Converts DatetimeArray/Index to PeriodArray/Index.\n",
" | \n",
" | Parameters\n",
" | ----------\n",
" | freq : str or Offset, optional\n",
" | One of pandas' :ref:`offset strings <timeseries.offset_aliases>`\n",
" | or an Offset object. Will be inferred by default.\n",
" | \n",
" | Returns\n",
" | -------\n",
" | PeriodArray/Index\n",
" | \n",
" | Raises\n",
" | ------\n",
" | ValueError\n",
" | When converting a DatetimeArray/Index with non-regular values,\n",
" | so that a frequency cannot be inferred.\n",
" | \n",
" | See Also\n",
" | --------\n",
" | PeriodIndex: Immutable ndarray holding ordinal values.\n",
" | DatetimeIndex.to_pydatetime: Return DatetimeIndex as object.\n",
" | \n",
" | Examples\n",
" | --------\n",
" | >>> df = pd.DataFrame({\"y\": [1, 2, 3]},\n",
" | ... index=pd.to_datetime([\"2000-03-31 00:00:00\",\n",
" | ... \"2000-05-31 00:00:00\",\n",
" | ... \"2000-08-31 00:00:00\"]))\n",
" | >>> df.index.to_period(\"M\")\n",
" | PeriodIndex(['2000-03', '2000-05', '2000-08'],\n",
" | dtype='period[M]', freq='M')\n",
" | \n",
" | Infer the daily frequency\n",
" | \n",
" | >>> idx = pd.date_range(\"2017-01-01\", periods=2)\n",
" | >>> idx.to_period()\n",
" | PeriodIndex(['2017-01-01', '2017-01-02'],\n",
" | dtype='period[D]', freq='D')\n",
" | \n",
" | to_pydatetime(self)\n",
" | Return the data as an array of native Python datetime objects.\n",
" | \n",
" | Timezone information is retained if present.\n",
" | \n",
" | .. warning::\n",
" | \n",
" | Python's datetime uses microsecond resolution, which is lower than\n",
" | pandas (nanosecond). The values are truncated.\n",
" | \n",
" | Returns\n",
" | -------\n",
" | numpy.ndarray\n",
" | Object dtype array containing native Python datetime objects.\n",
" | \n",
" | See Also\n",
" | --------\n",
" | datetime.datetime : Standard library value for a datetime.\n",
" | \n",
" | Examples\n",
" | --------\n",
" | >>> s = pd.Series(pd.date_range('20180310', periods=2))\n",
" | >>> s\n",
" | 0 2018-03-10\n",
" | 1 2018-03-11\n",
" | dtype: datetime64[ns]\n",
" | \n",
" | >>> s.dt.to_pydatetime()\n",
" | array([datetime.datetime(2018, 3, 10, 0, 0),\n",
" | datetime.datetime(2018, 3, 11, 0, 0)], dtype=object)\n",
" | \n",
" | pandas' nanosecond precision is truncated to microseconds.\n",
" | \n",
" | >>> s = pd.Series(pd.date_range('20180310', periods=2, freq='ns'))\n",
" | >>> s\n",
" | 0 2018-03-10 00:00:00.000000000\n",
" | 1 2018-03-10 00:00:00.000000001\n",
" | dtype: datetime64[ns]\n",
" | \n",
" | >>> s.dt.to_pydatetime()\n",
" | array([datetime.datetime(2018, 3, 10, 0, 0),\n",
" | datetime.datetime(2018, 3, 10, 0, 0)], dtype=object)\n",
" | \n",
" | tz_convert(self, *args, **kwargs)\n",
" | Convert tz-aware Datetime Array/Index from one time zone to another.\n",
" | \n",
" | Parameters\n",
" | ----------\n",
" | tz : str, pytz.timezone, dateutil.tz.tzfile or None\n",
" | Time zone for time. Corresponding timestamps would be converted\n",
" | to this time zone of the Datetime Array/Index. A `tz` of None will\n",
" | convert to UTC and remove the timezone information.\n",
" | \n",
" | Returns\n",
" | -------\n",
" | Array or Index\n",
" | \n",
" | Raises\n",
" | ------\n",
" | TypeError\n",
" | If Datetime Array/Index is tz-naive.\n",
" | \n",
" | See Also\n",
" | --------\n",
" | DatetimeIndex.tz : A timezone that has a variable offset from UTC.\n",
" | DatetimeIndex.tz_localize : Localize tz-naive DatetimeIndex to a\n",
" | given time zone, or remove timezone from a tz-aware DatetimeIndex.\n",
" | \n",
" | Examples\n",
" | --------\n",
" | With the `tz` parameter, we can change the DatetimeIndex\n",
" | to other time zones:\n",
" | \n",
" | >>> dti = pd.date_range(start='2014-08-01 09:00',\n",
" | ... freq='H', periods=3, tz='Europe/Berlin')\n",
" | \n",
" | >>> dti\n",
" | DatetimeIndex(['2014-08-01 09:00:00+02:00',\n",
" | '2014-08-01 10:00:00+02:00',\n",
" | '2014-08-01 11:00:00+02:00'],\n",
" | dtype='datetime64[ns, Europe/Berlin]', freq='H')\n",
" | \n",
" | >>> dti.tz_convert('US/Central')\n",
" | DatetimeIndex(['2014-08-01 02:00:00-05:00',\n",
" | '2014-08-01 03:00:00-05:00',\n",
" | '2014-08-01 04:00:00-05:00'],\n",
" | dtype='datetime64[ns, US/Central]', freq='H')\n",
" | \n",
" | With the ``tz=None``, we can remove the timezone (after converting\n",
" | to UTC if necessary):\n",
" | \n",
" | >>> dti = pd.date_range(start='2014-08-01 09:00', freq='H',\n",
" | ... periods=3, tz='Europe/Berlin')\n",
" | \n",
" | >>> dti\n",
" | DatetimeIndex(['2014-08-01 09:00:00+02:00',\n",
" | '2014-08-01 10:00:00+02:00',\n",
" | '2014-08-01 11:00:00+02:00'],\n",
" | dtype='datetime64[ns, Europe/Berlin]', freq='H')\n",
" | \n",
" | >>> dti.tz_convert(None)\n",
" | DatetimeIndex(['2014-08-01 07:00:00',\n",
" | '2014-08-01 08:00:00',\n",
" | '2014-08-01 09:00:00'],\n",
" | dtype='datetime64[ns]', freq='H')\n",
" | \n",
" | tz_localize(self, *args, **kwargs)\n",
" | Localize tz-naive Datetime Array/Index to tz-aware\n",
" | Datetime Array/Index.\n",
" | \n",
" | This method takes a time zone (tz) naive Datetime Array/Index object\n",
" | and makes this time zone aware. It does not move the time to another\n",
" | time zone.\n",
" | Time zone localization helps to switch from time zone aware to time\n",
" | zone unaware objects.\n",
" | \n",
" | Parameters\n",
" | ----------\n",
" | tz : str, pytz.timezone, dateutil.tz.tzfile or None\n",
" | Time zone to convert timestamps to. Passing ``None`` will\n",
" | remove the time zone information preserving local time.\n",
" | ambiguous : 'infer', 'NaT', bool array, default 'raise'\n",
" | When clocks moved backward due to DST, ambiguous times may arise.\n",
" | For example in Central European Time (UTC+01), when going from\n",
" | 03:00 DST to 02:00 non-DST, 02:30:00 local time occurs both at\n",
" | 00:30:00 UTC and at 01:30:00 UTC. In such a situation, the\n",
" | `ambiguous` parameter dictates how ambiguous times should be\n",
" | handled.\n",
" | \n",
" | - 'infer' will attempt to infer fall dst-transition hours based on\n",
" | order\n",
" | - bool-ndarray where True signifies a DST time, False signifies a\n",
" | non-DST time (note that this flag is only applicable for\n",
" | ambiguous times)\n",
" | - 'NaT' will return NaT where there are ambiguous times\n",
" | - 'raise' will raise an AmbiguousTimeError if there are ambiguous\n",
" | times.\n",
" | \n",
" | nonexistent : 'shift_forward', 'shift_backward, 'NaT', timedelta, default 'raise'\n",
" | A nonexistent time does not exist in a particular timezone\n",
" | where clocks moved forward due to DST.\n",
" | \n",
" | - 'shift_forward' will shift the nonexistent time forward to the\n",
" | closest existing time\n",
" | - 'shift_backward' will shift the nonexistent time backward to the\n",
" | closest existing time\n",
" | - 'NaT' will return NaT where there are nonexistent times\n",
" | - timedelta objects will shift nonexistent times by the timedelta\n",
" | - 'raise' will raise an NonExistentTimeError if there are\n",
" | nonexistent times.\n",
" | \n",
" | .. versionadded:: 0.24.0\n",
" | \n",
" | Returns\n",
" | -------\n",
" | Same type as self\n",
" | Array/Index converted to the specified time zone.\n",
" | \n",
" | Raises\n",
" | ------\n",
" | TypeError\n",
" | If the Datetime Array/Index is tz-aware and tz is not None.\n",
" | \n",
" | See Also\n",
" | --------\n",
" | DatetimeIndex.tz_convert : Convert tz-aware DatetimeIndex from\n",
" | one time zone to another.\n",
" | \n",
" | Examples\n",
" | --------\n",
" | >>> tz_naive = pd.date_range('2018-03-01 09:00', periods=3)\n",
" | >>> tz_naive\n",
" | DatetimeIndex(['2018-03-01 09:00:00', '2018-03-02 09:00:00',\n",
" | '2018-03-03 09:00:00'],\n",
" | dtype='datetime64[ns]', freq='D')\n",
" | \n",
" | Localize DatetimeIndex in US/Eastern time zone:\n",
" | \n",
" | >>> tz_aware = tz_naive.tz_localize(tz='US/Eastern')\n",
" | >>> tz_aware\n",
" | DatetimeIndex(['2018-03-01 09:00:00-05:00',\n",
" | '2018-03-02 09:00:00-05:00',\n",
" | '2018-03-03 09:00:00-05:00'],\n",
" | dtype='datetime64[ns, US/Eastern]', freq='D')\n",
" | \n",
" | With the ``tz=None``, we can remove the time zone information\n",
" | while keeping the local time (not converted to UTC):\n",
" | \n",
" | >>> tz_aware.tz_localize(None)\n",
" | DatetimeIndex(['2018-03-01 09:00:00', '2018-03-02 09:00:00',\n",
" | '2018-03-03 09:00:00'],\n",
" | dtype='datetime64[ns]', freq='D')\n",
" | \n",
" | Be careful with DST changes. When there is sequential data, pandas can\n",
" | infer the DST time:\n",
" | \n",
" | >>> s = pd.to_datetime(pd.Series(['2018-10-28 01:30:00',\n",
" | ... '2018-10-28 02:00:00',\n",
" | ... '2018-10-28 02:30:00',\n",
" | ... '2018-10-28 02:00:00',\n",
" | ... '2018-10-28 02:30:00',\n",
" | ... '2018-10-28 03:00:00',\n",
" | ... '2018-10-28 03:30:00']))\n",
" | >>> s.dt.tz_localize('CET', ambiguous='infer')\n",
" | 0 2018-10-28 01:30:00+02:00\n",
" | 1 2018-10-28 02:00:00+02:00\n",
" | 2 2018-10-28 02:30:00+02:00\n",
" | 3 2018-10-28 02:00:00+01:00\n",
" | 4 2018-10-28 02:30:00+01:00\n",
" | 5 2018-10-28 03:00:00+01:00\n",
" | 6 2018-10-28 03:30:00+01:00\n",
" | dtype: datetime64[ns, CET]\n",
" | \n",
" | In some cases, inferring the DST is impossible. In such cases, you can\n",
" | pass an ndarray to the ambiguous parameter to set the DST explicitly\n",
" | \n",
" | >>> s = pd.to_datetime(pd.Series(['2018-10-28 01:20:00',\n",
" | ... '2018-10-28 02:36:00',\n",
" | ... '2018-10-28 03:46:00']))\n",
" | >>> s.dt.tz_localize('CET', ambiguous=np.array([True, True, False]))\n",
" | 0 2015-03-29 03:00:00+02:00\n",
" | 1 2015-03-29 03:30:00+02:00\n",
" | dtype: datetime64[ns, Europe/Warsaw]\n",
" | \n",
" | If the DST transition causes nonexistent times, you can shift these\n",
" | dates forward or backwards with a timedelta object or `'shift_forward'`\n",
" | or `'shift_backwards'`.\n",
" | \n",
" | >>> s = pd.to_datetime(pd.Series(['2015-03-29 02:30:00',\n",
" | ... '2015-03-29 03:30:00']))\n",
" | >>> s.dt.tz_localize('Europe/Warsaw', nonexistent='shift_forward')\n",
" | 0 2015-03-29 03:00:00+02:00\n",
" | 1 2015-03-29 03:30:00+02:00\n",
" | dtype: datetime64[ns, 'Europe/Warsaw']\n",
" | >>> s.dt.tz_localize('Europe/Warsaw', nonexistent='shift_backward')\n",
" | 0 2015-03-29 01:59:59.999999999+01:00\n",
" | 1 2015-03-29 03:30:00+02:00\n",
" | dtype: datetime64[ns, 'Europe/Warsaw']\n",
" | >>> s.dt.tz_localize('Europe/Warsaw', nonexistent=pd.Timedelta('1H'))\n",
" | 0 2015-03-29 03:30:00+02:00\n",
" | 1 2015-03-29 03:30:00+02:00\n",
" | dtype: datetime64[ns, 'Europe/Warsaw']\n",
" | \n",
" | ----------------------------------------------------------------------\n",
" | Data descriptors defined here:\n",
" | \n",
" | date\n",
" | Returns numpy array of python datetime.date objects (namely, the date\n",
" | part of Timestamps without timezone information).\n",
" | \n",
" | day\n",
" | The month as January=1, December=12.\n",
" | \n",
" | dayofweek\n",
" | The day of the week with Monday=0, Sunday=6.\n",
" | \n",
" | Return the day of the week. It is assumed the week starts on\n",
" | Monday, which is denoted by 0 and ends on Sunday which is denoted\n",
" | by 6. This method is available on both Series with datetime\n",
" | values (using the `dt` accessor) or DatetimeIndex.\n",
" | \n",
" | Returns\n",
" | -------\n",
" | Series or Index\n",
" | Containing integers indicating the day number.\n",
" | \n",
" | See Also\n",
" | --------\n",
" | Series.dt.dayofweek : Alias.\n",
" | Series.dt.weekday : Alias.\n",
" | Series.dt.day_name : Returns the name of the day of the week.\n",
" | \n",
" | Examples\n",
" | --------\n",
" | >>> s = pd.date_range('2016-12-31', '2017-01-08', freq='D').to_series()\n",
" | >>> s.dt.dayofweek\n",
" | 2016-12-31 5\n",
" | 2017-01-01 6\n",
" | 2017-01-02 0\n",
" | 2017-01-03 1\n",
" | 2017-01-04 2\n",
" | 2017-01-05 3\n",
" | 2017-01-06 4\n",
" | 2017-01-07 5\n",
" | 2017-01-08 6\n",
" | Freq: D, dtype: int64\n",
" | \n",
" | dayofyear\n",
" | The ordinal day of the year.\n",
" | \n",
" | days_in_month\n",
" | The number of days in the month.\n",
" | \n",
" | daysinmonth\n",
" | The number of days in the month.\n",
" | \n",
" | freq\n",
" | \n",
" | hour\n",
" | The hours of the datetime.\n",
" | \n",
" | is_leap_year\n",
" | Boolean indicator if the date belongs to a leap year.\n",
" | \n",
" | A leap year is a year, which has 366 days (instead of 365) including\n",
" | 29th of February as an intercalary day.\n",
" | Leap years are years which are multiples of four with the exception\n",
" | of years divisible by 100 but not by 400.\n",
" | \n",
" | Returns\n",
" | -------\n",
" | Series or ndarray\n",
" | Booleans indicating if dates belong to a leap year.\n",
" | \n",
" | Examples\n",
" | --------\n",
" | This method is available on Series with datetime values under\n",
" | the ``.dt`` accessor, and directly on DatetimeIndex.\n",
" | \n",
" | >>> idx = pd.date_range(\"2012-01-01\", \"2015-01-01\", freq=\"Y\")\n",
" | >>> idx\n",
" | DatetimeIndex(['2012-12-31', '2013-12-31', '2014-12-31'],\n",
" | dtype='datetime64[ns]', freq='A-DEC')\n",
" | >>> idx.is_leap_year\n",
" | array([ True, False, False], dtype=bool)\n",
" | \n",
" | >>> dates = pd.Series(idx)\n",
" | >>> dates_series\n",
" | 0 2012-12-31\n",
" | 1 2013-12-31\n",
" | 2 2014-12-31\n",
" | dtype: datetime64[ns]\n",
" | >>> dates_series.dt.is_leap_year\n",
" | 0 True\n",
" | 1 False\n",
" | 2 False\n",
" | dtype: bool\n",
" | \n",
" | is_month_end\n",
" | Indicates whether the date is the last day of the month.\n",
" | \n",
" | Returns\n",
" | -------\n",
" | Series or array\n",
" | For Series, returns a Series with boolean values.\n",
" | For DatetimeIndex, returns a boolean array.\n",
" | \n",
" | See Also\n",
" | --------\n",
" | is_month_start : Return a boolean indicating whether the date\n",
" | is the first day of the month.\n",
" | is_month_end : Return a boolean indicating whether the date\n",
" | is the last day of the month.\n",
" | \n",
" | Examples\n",
" | --------\n",
" | This method is available on Series with datetime values under\n",
" | the ``.dt`` accessor, and directly on DatetimeIndex.\n",
" | \n",
" | >>> s = pd.Series(pd.date_range(\"2018-02-27\", periods=3))\n",
" | >>> s\n",
" | 0 2018-02-27\n",
" | 1 2018-02-28\n",
" | 2 2018-03-01\n",
" | dtype: datetime64[ns]\n",
" | >>> s.dt.is_month_start\n",
" | 0 False\n",
" | 1 False\n",
" | 2 True\n",
" | dtype: bool\n",
" | >>> s.dt.is_month_end\n",
" | 0 False\n",
" | 1 True\n",
" | 2 False\n",
" | dtype: bool\n",
" | \n",
" | >>> idx = pd.date_range(\"2018-02-27\", periods=3)\n",
" | >>> idx.is_month_start\n",
" | array([False, False, True])\n",
" | >>> idx.is_month_end\n",
" | array([False, True, False])\n",
" | \n",
" | is_month_start\n",
" | Indicates whether the date is the first day of the month.\n",
" | \n",
" | Returns\n",
" | -------\n",
" | Series or array\n",
" | For Series, returns a Series with boolean values.\n",
" | For DatetimeIndex, returns a boolean array.\n",
" | \n",
" | See Also\n",
" | --------\n",
" | is_month_start : Return a boolean indicating whether the date\n",
" | is the first day of the month.\n",
" | is_month_end : Return a boolean indicating whether the date\n",
" | is the last day of the month.\n",
" | \n",
" | Examples\n",
" | --------\n",
" | This method is available on Series with datetime values under\n",
" | the ``.dt`` accessor, and directly on DatetimeIndex.\n",
" | \n",
" | >>> s = pd.Series(pd.date_range(\"2018-02-27\", periods=3))\n",
" | >>> s\n",
" | 0 2018-02-27\n",
" | 1 2018-02-28\n",
" | 2 2018-03-01\n",
" | dtype: datetime64[ns]\n",
" | >>> s.dt.is_month_start\n",
" | 0 False\n",
" | 1 False\n",
" | 2 True\n",
" | dtype: bool\n",
" | >>> s.dt.is_month_end\n",
" | 0 False\n",
" | 1 True\n",
" | 2 False\n",
" | dtype: bool\n",
" | \n",
" | >>> idx = pd.date_range(\"2018-02-27\", periods=3)\n",
" | >>> idx.is_month_start\n",
" | array([False, False, True])\n",
" | >>> idx.is_month_end\n",
" | array([False, True, False])\n",
" | \n",
" | is_quarter_end\n",
" | Indicator for whether the date is the last day of a quarter.\n",
" | \n",
" | Returns\n",
" | -------\n",
" | is_quarter_end : Series or DatetimeIndex\n",
" | The same type as the original data with boolean values. Series will\n",
" | have the same name and index. DatetimeIndex will have the same\n",
" | name.\n",
" | \n",
" | See Also\n",
" | --------\n",
" | quarter : Return the quarter of the date.\n",
" | is_quarter_start : Similar property indicating the quarter start.\n",
" | \n",
" | Examples\n",
" | --------\n",
" | This method is available on Series with datetime values under\n",
" | the ``.dt`` accessor, and directly on DatetimeIndex.\n",
" | \n",
" | >>> df = pd.DataFrame({'dates': pd.date_range(\"2017-03-30\",\n",
" | ... periods=4)})\n",
" | >>> df.assign(quarter=df.dates.dt.quarter,\n",
" | ... is_quarter_end=df.dates.dt.is_quarter_end)\n",
" | dates quarter is_quarter_end\n",
" | 0 2017-03-30 1 False\n",
" | 1 2017-03-31 1 True\n",
" | 2 2017-04-01 2 False\n",
" | 3 2017-04-02 2 False\n",
" | \n",
" | >>> idx = pd.date_range('2017-03-30', periods=4)\n",
" | >>> idx\n",
" | DatetimeIndex(['2017-03-30', '2017-03-31', '2017-04-01', '2017-04-02'],\n",
" | dtype='datetime64[ns]', freq='D')\n",
" | \n",
" | >>> idx.is_quarter_end\n",
" | array([False, True, False, False])\n",
" | \n",
" | is_quarter_start\n",
" | Indicator for whether the date is the first day of a quarter.\n",
" | \n",
" | Returns\n",
" | -------\n",
" | is_quarter_start : Series or DatetimeIndex\n",
" | The same type as the original data with boolean values. Series will\n",
" | have the same name and index. DatetimeIndex will have the same\n",
" | name.\n",
" | \n",
" | See Also\n",
" | --------\n",
" | quarter : Return the quarter of the date.\n",
" | is_quarter_end : Similar property for indicating the quarter start.\n",
" | \n",
" | Examples\n",
" | --------\n",
" | This method is available on Series with datetime values under\n",
" | the ``.dt`` accessor, and directly on DatetimeIndex.\n",
" | \n",
" | >>> df = pd.DataFrame({'dates': pd.date_range(\"2017-03-30\",\n",
" | ... periods=4)})\n",
" | >>> df.assign(quarter=df.dates.dt.quarter,\n",
" | ... is_quarter_start=df.dates.dt.is_quarter_start)\n",
" | dates quarter is_quarter_start\n",
" | 0 2017-03-30 1 False\n",
" | 1 2017-03-31 1 False\n",
" | 2 2017-04-01 2 True\n",
" | 3 2017-04-02 2 False\n",
" | \n",
" | >>> idx = pd.date_range('2017-03-30', periods=4)\n",
" | >>> idx\n",
" | DatetimeIndex(['2017-03-30', '2017-03-31', '2017-04-01', '2017-04-02'],\n",
" | dtype='datetime64[ns]', freq='D')\n",
" | \n",
" | >>> idx.is_quarter_start\n",
" | array([False, False, True, False])\n",
" | \n",
" | is_year_end\n",
" | Indicate whether the date is the last day of the year.\n",
" | \n",
" | Returns\n",
" | -------\n",
" | Series or DatetimeIndex\n",
" | The same type as the original data with boolean values. Series will\n",
" | have the same name and index. DatetimeIndex will have the same\n",
" | name.\n",
" | \n",
" | See Also\n",
" | --------\n",
" | is_year_start : Similar property indicating the start of the year.\n",
" | \n",
" | Examples\n",
" | --------\n",
" | This method is available on Series with datetime values under\n",
" | the ``.dt`` accessor, and directly on DatetimeIndex.\n",
" | \n",
" | >>> dates = pd.Series(pd.date_range(\"2017-12-30\", periods=3))\n",
" | >>> dates\n",
" | 0 2017-12-30\n",
" | 1 2017-12-31\n",
" | 2 2018-01-01\n",
" | dtype: datetime64[ns]\n",
" | \n",
" | >>> dates.dt.is_year_end\n",
" | 0 False\n",
" | 1 True\n",
" | 2 False\n",
" | dtype: bool\n",
" | \n",
" | >>> idx = pd.date_range(\"2017-12-30\", periods=3)\n",
" | >>> idx\n",
" | DatetimeIndex(['2017-12-30', '2017-12-31', '2018-01-01'],\n",
" | dtype='datetime64[ns]', freq='D')\n",
" | \n",
" | >>> idx.is_year_end\n",
" | array([False, True, False])\n",
" | \n",
" | is_year_start\n",
" | Indicate whether the date is the first day of a year.\n",
" | \n",
" | Returns\n",
" | -------\n",
" | Series or DatetimeIndex\n",
" | The same type as the original data with boolean values. Series will\n",
" | have the same name and index. DatetimeIndex will have the same\n",
" | name.\n",
" | \n",
" | See Also\n",
" | --------\n",
" | is_year_end : Similar property indicating the last day of the year.\n",
" | \n",
" | Examples\n",
" | --------\n",
" | This method is available on Series with datetime values under\n",
" | the ``.dt`` accessor, and directly on DatetimeIndex.\n",
" | \n",
" | >>> dates = pd.Series(pd.date_range(\"2017-12-30\", periods=3))\n",
" | >>> dates\n",
" | 0 2017-12-30\n",
" | 1 2017-12-31\n",
" | 2 2018-01-01\n",
" | dtype: datetime64[ns]\n",
" | \n",
" | >>> dates.dt.is_year_start\n",
" | 0 False\n",
" | 1 False\n",
" | 2 True\n",
" | dtype: bool\n",
" | \n",
" | >>> idx = pd.date_range(\"2017-12-30\", periods=3)\n",
" | >>> idx\n",
" | DatetimeIndex(['2017-12-30', '2017-12-31', '2018-01-01'],\n",
" | dtype='datetime64[ns]', freq='D')\n",
" | \n",
" | >>> idx.is_year_start\n",
" | array([False, False, True])\n",
" | \n",
" | microsecond\n",
" | The microseconds of the datetime.\n",
" | \n",
" | minute\n",
" | The minutes of the datetime.\n",
" | \n",
" | month\n",
" | The month as January=1, December=12.\n",
" | \n",
" | nanosecond\n",
" | The nanoseconds of the datetime.\n",
" | \n",
" | quarter\n",
" | The quarter of the date.\n",
" | \n",
" | second\n",
" | The seconds of the datetime.\n",
" | \n",
" | time\n",
" | Returns numpy array of datetime.time. The time part of the Timestamps.\n",
" | \n",
" | timetz\n",
" | Returns numpy array of datetime.time also containing timezone\n",
" | information. The time part of the Timestamps.\n",
" | \n",
" | tz\n",
" | Return timezone, if any.\n",
" | \n",
" | Returns\n",
" | -------\n",
" | datetime.tzinfo, pytz.tzinfo.BaseTZInfo, dateutil.tz.tz.tzfile, or None\n",
" | Returns None when the array is tz-naive.\n",
" | \n",
" | week\n",
" | The week ordinal of the year.\n",
" | \n",
" | weekday\n",
" | The day of the week with Monday=0, Sunday=6.\n",
" | \n",
" | Return the day of the week. It is assumed the week starts on\n",
" | Monday, which is denoted by 0 and ends on Sunday which is denoted\n",
" | by 6. This method is available on both Series with datetime\n",
" | values (using the `dt` accessor) or DatetimeIndex.\n",
" | \n",
" | Returns\n",
" | -------\n",
" | Series or Index\n",
" | Containing integers indicating the day number.\n",
" | \n",
" | See Also\n",
" | --------\n",
" | Series.dt.dayofweek : Alias.\n",
" | Series.dt.weekday : Alias.\n",
" | Series.dt.day_name : Returns the name of the day of the week.\n",
" | \n",
" | Examples\n",
" | --------\n",
" | >>> s = pd.date_range('2016-12-31', '2017-01-08', freq='D').to_series()\n",
" | >>> s.dt.dayofweek\n",
" | 2016-12-31 5\n",
" | 2017-01-01 6\n",
" | 2017-01-02 0\n",
" | 2017-01-03 1\n",
" | 2017-01-04 2\n",
" | 2017-01-05 3\n",
" | 2017-01-06 4\n",
" | 2017-01-07 5\n",
" | 2017-01-08 6\n",
" | Freq: D, dtype: int64\n",
" | \n",
" | weekofyear\n",
" | The week ordinal of the year.\n",
" | \n",
" | year\n",
" | The year of the datetime.\n",
" | \n",
" | ----------------------------------------------------------------------\n",
" | Methods inherited from Properties:\n",
" | \n",
" | __init__(self, data, orig)\n",
" | Initialize self. See help(type(self)) for accurate signature.\n",
" | \n",
" | ----------------------------------------------------------------------\n",
" | Data descriptors inherited from pandas.core.accessor.PandasDelegate:\n",
" | \n",
" | __dict__\n",
" | dictionary for instance variables (if defined)\n",
" | \n",
" | __weakref__\n",
" | list of weak references to the object (if defined)\n",
" | \n",
" | ----------------------------------------------------------------------\n",
" | Methods inherited from pandas.core.base.PandasObject:\n",
" | \n",
" | __repr__(self) -> str\n",
" | Return a string representation for a particular object.\n",
" | \n",
" | __sizeof__(self)\n",
" | Generates the total memory usage for an object that returns\n",
" | either a value or Series of values\n",
" | \n",
" | ----------------------------------------------------------------------\n",
" | Methods inherited from pandas.core.accessor.DirNamesMixin:\n",
" | \n",
" | __dir__(self)\n",
" | Provide method name lookup and completion.\n",
" | \n",
" | Notes\n",
" | -----\n",
" | Only provide 'public' methods.\n",
" | \n",
" | ----------------------------------------------------------------------\n",
" | Data and other attributes inherited from pandas.core.accessor.DirNamesMixin:\n",
" | \n",
" | __annotations__ = {'_accessors': typing.Set[str], '_deprecations': typ...\n",
" | \n",
" | ----------------------------------------------------------------------\n",
" | Methods inherited from pandas.core.base.NoNewAttributesMixin:\n",
" | \n",
" | __setattr__(self, key, value)\n",
" | Implement setattr(self, name, value).\n",
"\n"
]
}
],
"source": [
"help(sales['DATE'].dt)"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 1\n",
"1 2\n",
"2 3\n",
"3 4\n",
"4 5\n",
" ..\n",
"335 12\n",
"336 1\n",
"337 2\n",
"338 3\n",
"339 4\n",
"Name: DATE, Length: 340, dtype: int64"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sales['DATE'].dt.month"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 True\n",
"1 True\n",
"2 True\n",
"3 True\n",
"4 True\n",
" ... \n",
"335 False\n",
"336 True\n",
"337 True\n",
"338 True\n",
"339 True\n",
"Name: DATE, Length: 340, dtype: bool"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sales['DATE'].dt.is_leap_year"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 1
}