You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
2132 lines
64 KiB
2132 lines
64 KiB
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"___\n",
|
|
"\n",
|
|
"<a href='http://www.pieriandata.com'><img src='../Pierian_Data_Logo.png'/></a>\n",
|
|
"___\n",
|
|
"<center><em>Copyright by Pierian Data Inc.</em></center>\n",
|
|
"<center><em>For more information, visit us at <a href='http://www.pieriandata.com'>www.pieriandata.com</a></em></center>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Inputs and Outputs"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<div class=\"alert alert-info\"><strong>NOTE:</strong> Typically we will just be either reading csv files directly or using pandas-datareader to pull data from the web. Consider this lecture just a quick overview of what is possible with pandas (we won't be working with SQL or Excel files in this course)</div>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Data Input and Output\n",
|
|
"\n",
|
|
"This notebook is the reference code for getting input and output, pandas can read a variety of file types using its pd.read_ methods. Let's take a look at the most common data types:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 52,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import numpy as np\n",
|
|
"import pandas as pd"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Check out the references here! \n",
|
|
"\n",
|
|
"**This is the best online resource for how to read/write to a variety of data sources!**\n",
|
|
"\n",
|
|
"https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html\n",
|
|
"\n",
|
|
"----\n",
|
|
"----"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<table border=\"1\" class=\"colwidths-given docutils\">\n",
|
|
"<colgroup>\n",
|
|
"<col width=\"12%\" />\n",
|
|
"<col width=\"40%\" />\n",
|
|
"<col width=\"24%\" />\n",
|
|
"<col width=\"24%\" />\n",
|
|
"</colgroup>\n",
|
|
"<thead valign=\"bottom\">\n",
|
|
"<tr class=\"row-odd\"><th class=\"head\">Format Type</th>\n",
|
|
"<th class=\"head\">Data Description</th>\n",
|
|
"<th class=\"head\">Reader</th>\n",
|
|
"<th class=\"head\">Writer</th>\n",
|
|
"</tr>\n",
|
|
"</thead>\n",
|
|
"<tbody valign=\"top\">\n",
|
|
"<tr class=\"row-even\"><td>text</td>\n",
|
|
"<td><a class=\"reference external\" href=\"https://en.wikipedia.org/wiki/Comma-separated_values\">CSV</a></td>\n",
|
|
"<td><a class=\"reference internal\" href=\"#io-read-csv-table\"><span class=\"std std-ref\">read_csv</span></a></td>\n",
|
|
"<td><a class=\"reference internal\" href=\"#io-store-in-csv\"><span class=\"std std-ref\">to_csv</span></a></td>\n",
|
|
"</tr>\n",
|
|
"<tr class=\"row-odd\"><td>text</td>\n",
|
|
"<td><a class=\"reference external\" href=\"https://www.json.org/\">JSON</a></td>\n",
|
|
"<td><a class=\"reference internal\" href=\"#io-json-reader\"><span class=\"std std-ref\">read_json</span></a></td>\n",
|
|
"<td><a class=\"reference internal\" href=\"#io-json-writer\"><span class=\"std std-ref\">to_json</span></a></td>\n",
|
|
"</tr>\n",
|
|
"<tr class=\"row-even\"><td>text</td>\n",
|
|
"<td><a class=\"reference external\" href=\"https://en.wikipedia.org/wiki/HTML\">HTML</a></td>\n",
|
|
"<td><a class=\"reference internal\" href=\"#io-read-html\"><span class=\"std std-ref\">read_html</span></a></td>\n",
|
|
"<td><a class=\"reference internal\" href=\"#io-html\"><span class=\"std std-ref\">to_html</span></a></td>\n",
|
|
"</tr>\n",
|
|
"<tr class=\"row-odd\"><td>text</td>\n",
|
|
"<td>Local clipboard</td>\n",
|
|
"<td><a class=\"reference internal\" href=\"#io-clipboard\"><span class=\"std std-ref\">read_clipboard</span></a></td>\n",
|
|
"<td><a class=\"reference internal\" href=\"#io-clipboard\"><span class=\"std std-ref\">to_clipboard</span></a></td>\n",
|
|
"</tr>\n",
|
|
"<tr class=\"row-even\"><td>binary</td>\n",
|
|
"<td><a class=\"reference external\" href=\"https://en.wikipedia.org/wiki/Microsoft_Excel\">MS Excel</a></td>\n",
|
|
"<td><a class=\"reference internal\" href=\"#io-excel-reader\"><span class=\"std std-ref\">read_excel</span></a></td>\n",
|
|
"<td><a class=\"reference internal\" href=\"#io-excel-writer\"><span class=\"std std-ref\">to_excel</span></a></td>\n",
|
|
"</tr>\n",
|
|
"<tr class=\"row-odd\"><td>binary</td>\n",
|
|
"<td><a class=\"reference external\" href=\"http://www.opendocumentformat.org\">OpenDocument</a></td>\n",
|
|
"<td><a class=\"reference internal\" href=\"#io-ods\"><span class=\"std std-ref\">read_excel</span></a></td>\n",
|
|
"<td> </td>\n",
|
|
"</tr>\n",
|
|
"<tr class=\"row-even\"><td>binary</td>\n",
|
|
"<td><a class=\"reference external\" href=\"https://support.hdfgroup.org/HDF5/whatishdf5.html\">HDF5 Format</a></td>\n",
|
|
"<td><a class=\"reference internal\" href=\"#io-hdf5\"><span class=\"std std-ref\">read_hdf</span></a></td>\n",
|
|
"<td><a class=\"reference internal\" href=\"#io-hdf5\"><span class=\"std std-ref\">to_hdf</span></a></td>\n",
|
|
"</tr>\n",
|
|
"<tr class=\"row-odd\"><td>binary</td>\n",
|
|
"<td><a class=\"reference external\" href=\"https://github.com/wesm/feather\">Feather Format</a></td>\n",
|
|
"<td><a class=\"reference internal\" href=\"#io-feather\"><span class=\"std std-ref\">read_feather</span></a></td>\n",
|
|
"<td><a class=\"reference internal\" href=\"#io-feather\"><span class=\"std std-ref\">to_feather</span></a></td>\n",
|
|
"</tr>\n",
|
|
"<tr class=\"row-even\"><td>binary</td>\n",
|
|
"<td><a class=\"reference external\" href=\"https://parquet.apache.org/\">Parquet Format</a></td>\n",
|
|
"<td><a class=\"reference internal\" href=\"#io-parquet\"><span class=\"std std-ref\">read_parquet</span></a></td>\n",
|
|
"<td><a class=\"reference internal\" href=\"#io-parquet\"><span class=\"std std-ref\">to_parquet</span></a></td>\n",
|
|
"</tr>\n",
|
|
"<tr class=\"row-odd\"><td>binary</td>\n",
|
|
"<td><a class=\"reference external\" href=\"https://msgpack.org/index.html\">Msgpack</a></td>\n",
|
|
"<td><a class=\"reference internal\" href=\"#io-msgpack\"><span class=\"std std-ref\">read_msgpack</span></a></td>\n",
|
|
"<td><a class=\"reference internal\" href=\"#io-msgpack\"><span class=\"std std-ref\">to_msgpack</span></a></td>\n",
|
|
"</tr>\n",
|
|
"<tr class=\"row-even\"><td>binary</td>\n",
|
|
"<td><a class=\"reference external\" href=\"https://en.wikipedia.org/wiki/Stata\">Stata</a></td>\n",
|
|
"<td><a class=\"reference internal\" href=\"#io-stata-reader\"><span class=\"std std-ref\">read_stata</span></a></td>\n",
|
|
"<td><a class=\"reference internal\" href=\"#io-stata-writer\"><span class=\"std std-ref\">to_stata</span></a></td>\n",
|
|
"</tr>\n",
|
|
"<tr class=\"row-odd\"><td>binary</td>\n",
|
|
"<td><a class=\"reference external\" href=\"https://en.wikipedia.org/wiki/SAS_(software)\">SAS</a></td>\n",
|
|
"<td><a class=\"reference internal\" href=\"#io-sas-reader\"><span class=\"std std-ref\">read_sas</span></a></td>\n",
|
|
"<td> </td>\n",
|
|
"</tr>\n",
|
|
"<tr class=\"row-even\"><td>binary</td>\n",
|
|
"<td><a class=\"reference external\" href=\"https://docs.python.org/3/library/pickle.html\">Python Pickle Format</a></td>\n",
|
|
"<td><a class=\"reference internal\" href=\"#io-pickle\"><span class=\"std std-ref\">read_pickle</span></a></td>\n",
|
|
"<td><a class=\"reference internal\" href=\"#io-pickle\"><span class=\"std std-ref\">to_pickle</span></a></td>\n",
|
|
"</tr>\n",
|
|
"<tr class=\"row-odd\"><td>SQL</td>\n",
|
|
"<td><a class=\"reference external\" href=\"https://en.wikipedia.org/wiki/SQL\">SQL</a></td>\n",
|
|
"<td><a class=\"reference internal\" href=\"#io-sql\"><span class=\"std std-ref\">read_sql</span></a></td>\n",
|
|
"<td><a class=\"reference internal\" href=\"#io-sql\"><span class=\"std std-ref\">to_sql</span></a></td>\n",
|
|
"</tr>\n",
|
|
"<tr class=\"row-even\"><td>SQL</td>\n",
|
|
"<td><a class=\"reference external\" href=\"https://en.wikipedia.org/wiki/BigQuery\">Google Big Query</a></td>\n",
|
|
"<td><a class=\"reference internal\" href=\"#io-bigquery\"><span class=\"std std-ref\">read_gbq</span></a></td>\n",
|
|
"<td><a class=\"reference internal\" href=\"#io-bigquery\"><span class=\"std std-ref\">to_gbq</span></a></td>\n",
|
|
"</tr>\n",
|
|
"</tbody>\n",
|
|
"</table>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"-----\n",
|
|
"----"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Reading in a CSV\n",
|
|
"Comma Separated Values files are text files that use commas as field delimeters.<br>\n",
|
|
"Unless you're running the virtual environment included with the course, you may need to install <tt>xlrd</tt> and <tt>openpyxl</tt>.<br>\n",
|
|
"In your terminal/command prompt run:\n",
|
|
"\n",
|
|
" conda install xlrd\n",
|
|
" conda install openpyxl\n",
|
|
"\n",
|
|
"Then restart Jupyter Notebook.\n",
|
|
"(or use pip install if you aren't using the Anaconda Distribution)\n",
|
|
"\n",
|
|
"## Understanding File Paths\n",
|
|
"\n",
|
|
"You have two options when reading a file with pandas:\n",
|
|
"\n",
|
|
"1. If your .py file or .ipynb notebook is located in the **exact** same folder location as the .csv file you want to read, simply pass in the file name as a string, for example:\n",
|
|
" \n",
|
|
" df = pd.read_csv('some_file.csv')\n",
|
|
" \n",
|
|
"2. Pass in the entire file path if you are located in a different directory. The file path must be 100% correct in order for this to work. For example:\n",
|
|
"\n",
|
|
" df = pd.read_csv(\"C:\\\\Users\\\\myself\\\\files\\\\some_file.csv\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Print your current directory file path with pwd"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 53,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"'C:\\\\Users\\\\Marcial\\\\Pierian-Data-Courses\\\\Machine-Learning-MasterClass\\\\03-Pandas'"
|
|
]
|
|
},
|
|
"execution_count": 53,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"pwd"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### List the files in your current directory with ls"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 54,
|
|
"metadata": {
|
|
"scrolled": true
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
" Volume in drive C has no label.\n",
|
|
" Volume Serial Number is 3652-BD2F\n",
|
|
"\n",
|
|
" Directory of C:\\Users\\Marcial\\Pierian-Data-Courses\\Machine-Learning-MasterClass\\03-Pandas\n",
|
|
"\n",
|
|
"07/04/2020 06:10 PM <DIR> .\n",
|
|
"07/04/2020 06:10 PM <DIR> ..\n",
|
|
"07/02/2020 05:40 PM <DIR> .ipynb_checkpoints\n",
|
|
"06/30/2020 04:51 PM 565,390 00-Series.ipynb\n",
|
|
"07/01/2020 12:48 PM 208,957 01-DataFrames.ipynb\n",
|
|
"07/01/2020 12:48 PM 194,591 02-Conditional-Filtering.ipynb\n",
|
|
"07/02/2020 07:02 PM 196,047 03-Useful-Methods.ipynb\n",
|
|
"07/01/2020 03:32 PM 64,227 04-Missing-Data.ipynb\n",
|
|
"07/04/2020 01:28 PM 219,627 05-Groupby-Operations-and-MultiIndex.ipynb\n",
|
|
"07/04/2020 03:19 PM 62,966 06-Combining-DataFrames.ipynb\n",
|
|
"07/02/2020 07:02 PM 29,356 07-Text-Methods.ipynb\n",
|
|
"07/02/2020 06:38 PM 35,705 08-Time-Methods.ipynb\n",
|
|
"07/04/2020 06:10 PM 53,097 09-Inputs-and-Outputs.ipynb\n",
|
|
"07/02/2020 05:34 PM 1,095 10-Pivot-Tables.ipynb\n",
|
|
"07/02/2020 05:34 PM 951 11-Pandas-Project-Exercise.ipynb\n",
|
|
"07/02/2020 05:34 PM 1,118 12-Pandas-Project-Exercise-Solution.ipynb\n",
|
|
"07/04/2020 05:39 PM 51 example.csv\n",
|
|
"07/04/2020 06:02 PM 5,022 example.xlsx\n",
|
|
"02/07/2020 12:26 PM 177 movie_scores.csv\n",
|
|
"07/01/2020 03:56 PM 17,727 mpg.csv\n",
|
|
"07/04/2020 05:58 PM 5,022 my_excel_file.xlsx\n",
|
|
"07/04/2020 05:56 PM 51 new_file.csv\n",
|
|
"07/02/2020 05:56 PM 5,459 RetailSales_BeerWineLiquor.csv\n",
|
|
"07/04/2020 05:56 PM 555 simple.html\n",
|
|
"01/27/2020 02:28 PM 18,752 tips.csv\n",
|
|
" 22 File(s) 1,685,943 bytes\n",
|
|
" 3 Dir(s) 82,818,367,488 bytes free\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"ls"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"-----\n",
|
|
"#### NOTE! Common confusion point! Take note that all read input methods are called directly from pandas with pd.read_ , all output methods are called directly off the dataframe with df.to_\n",
|
|
"\n",
|
|
"-------"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### CSV Input"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 55,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"df = pd.read_csv('example.csv')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 56,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>a</th>\n",
|
|
" <th>b</th>\n",
|
|
" <th>c</th>\n",
|
|
" <th>d</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>0</td>\n",
|
|
" <td>1</td>\n",
|
|
" <td>2</td>\n",
|
|
" <td>3</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>4</td>\n",
|
|
" <td>5</td>\n",
|
|
" <td>6</td>\n",
|
|
" <td>7</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>8</td>\n",
|
|
" <td>9</td>\n",
|
|
" <td>10</td>\n",
|
|
" <td>11</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td>12</td>\n",
|
|
" <td>13</td>\n",
|
|
" <td>14</td>\n",
|
|
" <td>15</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" a b c d\n",
|
|
"0 0 1 2 3\n",
|
|
"1 4 5 6 7\n",
|
|
"2 8 9 10 11\n",
|
|
"3 12 13 14 15"
|
|
]
|
|
},
|
|
"execution_count": 56,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"df"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 57,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"df = pd.read_csv('example.csv',index_col=0)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 58,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>b</th>\n",
|
|
" <th>c</th>\n",
|
|
" <th>d</th>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>a</th>\n",
|
|
" <th></th>\n",
|
|
" <th></th>\n",
|
|
" <th></th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>1</td>\n",
|
|
" <td>2</td>\n",
|
|
" <td>3</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td>5</td>\n",
|
|
" <td>6</td>\n",
|
|
" <td>7</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>8</th>\n",
|
|
" <td>9</td>\n",
|
|
" <td>10</td>\n",
|
|
" <td>11</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>12</th>\n",
|
|
" <td>13</td>\n",
|
|
" <td>14</td>\n",
|
|
" <td>15</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" b c d\n",
|
|
"a \n",
|
|
"0 1 2 3\n",
|
|
"4 5 6 7\n",
|
|
"8 9 10 11\n",
|
|
"12 13 14 15"
|
|
]
|
|
},
|
|
"execution_count": 58,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"df"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 59,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"df = pd.read_csv('example.csv')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 60,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>a</th>\n",
|
|
" <th>b</th>\n",
|
|
" <th>c</th>\n",
|
|
" <th>d</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>0</td>\n",
|
|
" <td>1</td>\n",
|
|
" <td>2</td>\n",
|
|
" <td>3</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>4</td>\n",
|
|
" <td>5</td>\n",
|
|
" <td>6</td>\n",
|
|
" <td>7</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>8</td>\n",
|
|
" <td>9</td>\n",
|
|
" <td>10</td>\n",
|
|
" <td>11</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td>12</td>\n",
|
|
" <td>13</td>\n",
|
|
" <td>14</td>\n",
|
|
" <td>15</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" a b c d\n",
|
|
"0 0 1 2 3\n",
|
|
"1 4 5 6 7\n",
|
|
"2 8 9 10 11\n",
|
|
"3 12 13 14 15"
|
|
]
|
|
},
|
|
"execution_count": 60,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"df"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### CSV Output\n",
|
|
"\n",
|
|
"Set index=False if you do not want to save the index , otherwise it will add a new column to the .csv file that includes your index and call it \"Unnamed: 0\" if your index did not have a name. If you do want to save your index, simply set it to True (the default value)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 61,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"df.to_csv('new_file.csv',index=False)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## HTML\n",
|
|
"\n",
|
|
"Pandas can read table tabs off of HTML. This only works if your firewall isn't blocking pandas from accessing the internet!\n",
|
|
"\n",
|
|
"Unless you're running the virtual environment included with the course, you may need to install <tt>lxml</tt>, <tt>htmllib5</tt>, and <tt>BeautifulSoup4</tt>.<br>\n",
|
|
"In your terminal/command prompt run:\n",
|
|
"\n",
|
|
" conda install lxml\n",
|
|
" \n",
|
|
" or\n",
|
|
" \n",
|
|
" pip install lxml\n",
|
|
" \n",
|
|
"Then restart Jupyter Notebook (you may need to restart your computer).\n",
|
|
"(or use pip install if you aren't using the Anaconda Distribution)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## read_html\n",
|
|
"\n",
|
|
"### HTML Input\n",
|
|
"\n",
|
|
"Pandas read_html function will read tables off of a webpage and return a list of DataFrame objects. NOTE: This only works with well defined <table> objects in the html on the page, this can not magically read in tables that are images on a page."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 62,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"tables = pd.read_html('https://en.wikipedia.org/wiki/World_population')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 63,
|
|
"metadata": {
|
|
"scrolled": true
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"26"
|
|
]
|
|
},
|
|
"execution_count": 63,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"len(tables) #tables"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"collapsed": true
|
|
},
|
|
"source": [
|
|
"### Not Useful Tables\n",
|
|
"Pandas found 26 tables on that page. Some are not useful:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 64,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>0</th>\n",
|
|
" <th>1</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>An editor has expressed concern that this arti...</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" 0 1\n",
|
|
"0 NaN An editor has expressed concern that this arti..."
|
|
]
|
|
},
|
|
"execution_count": 64,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"tables[0]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Tables that need formatting"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Some will be misaligned, meaning you need to do extra work to fix the columns and rows:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 65,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead tr th {\n",
|
|
" text-align: left;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr>\n",
|
|
" <th></th>\n",
|
|
" <th colspan=\"5\" halign=\"left\">World population (millions, UN estimates)[14]</th>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th></th>\n",
|
|
" <th>#</th>\n",
|
|
" <th>Top ten most populous countries</th>\n",
|
|
" <th>2000</th>\n",
|
|
" <th>2015</th>\n",
|
|
" <th>2030[A]</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>1</td>\n",
|
|
" <td>China[B]</td>\n",
|
|
" <td>1270</td>\n",
|
|
" <td>1376</td>\n",
|
|
" <td>1416</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>2</td>\n",
|
|
" <td>India</td>\n",
|
|
" <td>1053</td>\n",
|
|
" <td>1311</td>\n",
|
|
" <td>1528</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>3</td>\n",
|
|
" <td>United States</td>\n",
|
|
" <td>283</td>\n",
|
|
" <td>322</td>\n",
|
|
" <td>356</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td>4</td>\n",
|
|
" <td>Indonesia</td>\n",
|
|
" <td>212</td>\n",
|
|
" <td>258</td>\n",
|
|
" <td>295</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td>5</td>\n",
|
|
" <td>Pakistan</td>\n",
|
|
" <td>136</td>\n",
|
|
" <td>208</td>\n",
|
|
" <td>245</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>5</th>\n",
|
|
" <td>6</td>\n",
|
|
" <td>Brazil</td>\n",
|
|
" <td>176</td>\n",
|
|
" <td>206</td>\n",
|
|
" <td>228</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>6</th>\n",
|
|
" <td>7</td>\n",
|
|
" <td>Nigeria</td>\n",
|
|
" <td>123</td>\n",
|
|
" <td>182</td>\n",
|
|
" <td>263</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>7</th>\n",
|
|
" <td>8</td>\n",
|
|
" <td>Bangladesh</td>\n",
|
|
" <td>131</td>\n",
|
|
" <td>161</td>\n",
|
|
" <td>186</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>8</th>\n",
|
|
" <td>9</td>\n",
|
|
" <td>Russia</td>\n",
|
|
" <td>146</td>\n",
|
|
" <td>146</td>\n",
|
|
" <td>149</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>9</th>\n",
|
|
" <td>10</td>\n",
|
|
" <td>Mexico</td>\n",
|
|
" <td>103</td>\n",
|
|
" <td>127</td>\n",
|
|
" <td>148</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>10</th>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>World total</td>\n",
|
|
" <td>6127</td>\n",
|
|
" <td>7349</td>\n",
|
|
" <td>8501</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>11</th>\n",
|
|
" <td>Notes: ^ 2030 = Medium variant. ^ China exclud...</td>\n",
|
|
" <td>Notes: ^ 2030 = Medium variant. ^ China exclud...</td>\n",
|
|
" <td>Notes: ^ 2030 = Medium variant. ^ China exclud...</td>\n",
|
|
" <td>Notes: ^ 2030 = Medium variant. ^ China exclud...</td>\n",
|
|
" <td>Notes: ^ 2030 = Medium variant. ^ China exclud...</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" World population (millions, UN estimates)[14] \\\n",
|
|
" # \n",
|
|
"0 1 \n",
|
|
"1 2 \n",
|
|
"2 3 \n",
|
|
"3 4 \n",
|
|
"4 5 \n",
|
|
"5 6 \n",
|
|
"6 7 \n",
|
|
"7 8 \n",
|
|
"8 9 \n",
|
|
"9 10 \n",
|
|
"10 NaN \n",
|
|
"11 Notes: ^ 2030 = Medium variant. ^ China exclud... \n",
|
|
"\n",
|
|
" \\\n",
|
|
" Top ten most populous countries \n",
|
|
"0 China[B] \n",
|
|
"1 India \n",
|
|
"2 United States \n",
|
|
"3 Indonesia \n",
|
|
"4 Pakistan \n",
|
|
"5 Brazil \n",
|
|
"6 Nigeria \n",
|
|
"7 Bangladesh \n",
|
|
"8 Russia \n",
|
|
"9 Mexico \n",
|
|
"10 World total \n",
|
|
"11 Notes: ^ 2030 = Medium variant. ^ China exclud... \n",
|
|
"\n",
|
|
" \\\n",
|
|
" 2000 \n",
|
|
"0 1270 \n",
|
|
"1 1053 \n",
|
|
"2 283 \n",
|
|
"3 212 \n",
|
|
"4 136 \n",
|
|
"5 176 \n",
|
|
"6 123 \n",
|
|
"7 131 \n",
|
|
"8 146 \n",
|
|
"9 103 \n",
|
|
"10 6127 \n",
|
|
"11 Notes: ^ 2030 = Medium variant. ^ China exclud... \n",
|
|
"\n",
|
|
" \\\n",
|
|
" 2015 \n",
|
|
"0 1376 \n",
|
|
"1 1311 \n",
|
|
"2 322 \n",
|
|
"3 258 \n",
|
|
"4 208 \n",
|
|
"5 206 \n",
|
|
"6 182 \n",
|
|
"7 161 \n",
|
|
"8 146 \n",
|
|
"9 127 \n",
|
|
"10 7349 \n",
|
|
"11 Notes: ^ 2030 = Medium variant. ^ China exclud... \n",
|
|
"\n",
|
|
" \n",
|
|
" 2030[A] \n",
|
|
"0 1416 \n",
|
|
"1 1528 \n",
|
|
"2 356 \n",
|
|
"3 295 \n",
|
|
"4 245 \n",
|
|
"5 228 \n",
|
|
"6 263 \n",
|
|
"7 186 \n",
|
|
"8 149 \n",
|
|
"9 148 \n",
|
|
"10 8501 \n",
|
|
"11 Notes: ^ 2030 = Medium variant. ^ China exclud... "
|
|
]
|
|
},
|
|
"execution_count": 65,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"tables[1]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 66,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"world_pop = tables[1]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 67,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"MultiIndex([('World population (millions, UN estimates)[14]', ...),\n",
|
|
" ('World population (millions, UN estimates)[14]', ...),\n",
|
|
" ('World population (millions, UN estimates)[14]', ...),\n",
|
|
" ('World population (millions, UN estimates)[14]', ...),\n",
|
|
" ('World population (millions, UN estimates)[14]', ...)],\n",
|
|
" )"
|
|
]
|
|
},
|
|
"execution_count": 67,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"world_pop.columns"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 68,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"world_pop = world_pop['World population (millions, UN estimates)[14]'].drop('#',axis=1)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 69,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"Index(['Top ten most populous countries', '2000', '2015', '2030[A]'], dtype='object')"
|
|
]
|
|
},
|
|
"execution_count": 69,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"world_pop.columns"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 70,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"world_pop.columns = ['Countries', '2000', '2015', '2030 Est.']\n",
|
|
"world_pop = world_pop.drop(11,axis=0)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 71,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>Countries</th>\n",
|
|
" <th>2000</th>\n",
|
|
" <th>2015</th>\n",
|
|
" <th>2030 Est.</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>China[B]</td>\n",
|
|
" <td>1270</td>\n",
|
|
" <td>1376</td>\n",
|
|
" <td>1416</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>India</td>\n",
|
|
" <td>1053</td>\n",
|
|
" <td>1311</td>\n",
|
|
" <td>1528</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>United States</td>\n",
|
|
" <td>283</td>\n",
|
|
" <td>322</td>\n",
|
|
" <td>356</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td>Indonesia</td>\n",
|
|
" <td>212</td>\n",
|
|
" <td>258</td>\n",
|
|
" <td>295</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td>Pakistan</td>\n",
|
|
" <td>136</td>\n",
|
|
" <td>208</td>\n",
|
|
" <td>245</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>5</th>\n",
|
|
" <td>Brazil</td>\n",
|
|
" <td>176</td>\n",
|
|
" <td>206</td>\n",
|
|
" <td>228</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>6</th>\n",
|
|
" <td>Nigeria</td>\n",
|
|
" <td>123</td>\n",
|
|
" <td>182</td>\n",
|
|
" <td>263</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>7</th>\n",
|
|
" <td>Bangladesh</td>\n",
|
|
" <td>131</td>\n",
|
|
" <td>161</td>\n",
|
|
" <td>186</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>8</th>\n",
|
|
" <td>Russia</td>\n",
|
|
" <td>146</td>\n",
|
|
" <td>146</td>\n",
|
|
" <td>149</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>9</th>\n",
|
|
" <td>Mexico</td>\n",
|
|
" <td>103</td>\n",
|
|
" <td>127</td>\n",
|
|
" <td>148</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>10</th>\n",
|
|
" <td>World total</td>\n",
|
|
" <td>6127</td>\n",
|
|
" <td>7349</td>\n",
|
|
" <td>8501</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" Countries 2000 2015 2030 Est.\n",
|
|
"0 China[B] 1270 1376 1416\n",
|
|
"1 India 1053 1311 1528\n",
|
|
"2 United States 283 322 356\n",
|
|
"3 Indonesia 212 258 295\n",
|
|
"4 Pakistan 136 208 245\n",
|
|
"5 Brazil 176 206 228\n",
|
|
"6 Nigeria 123 182 263\n",
|
|
"7 Bangladesh 131 161 186\n",
|
|
"8 Russia 146 146 149\n",
|
|
"9 Mexico 103 127 148\n",
|
|
"10 World total 6127 7349 8501"
|
|
]
|
|
},
|
|
"execution_count": 71,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"world_pop"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Tables that are intact"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 72,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>Rank</th>\n",
|
|
" <th>Country</th>\n",
|
|
" <th>Population</th>\n",
|
|
" <th>Area (km2)</th>\n",
|
|
" <th>Density (Pop. per km2)</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>1</td>\n",
|
|
" <td>Singapore</td>\n",
|
|
" <td>5703600</td>\n",
|
|
" <td>710</td>\n",
|
|
" <td>8033</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>2</td>\n",
|
|
" <td>Bangladesh</td>\n",
|
|
" <td>168870000</td>\n",
|
|
" <td>143998</td>\n",
|
|
" <td>1173</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>3</td>\n",
|
|
" <td>Lebanon</td>\n",
|
|
" <td>6855713</td>\n",
|
|
" <td>10452</td>\n",
|
|
" <td>656</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td>4</td>\n",
|
|
" <td>Taiwan</td>\n",
|
|
" <td>23604265</td>\n",
|
|
" <td>36193</td>\n",
|
|
" <td>652</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td>5</td>\n",
|
|
" <td>South Korea</td>\n",
|
|
" <td>51780579</td>\n",
|
|
" <td>99538</td>\n",
|
|
" <td>520</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>5</th>\n",
|
|
" <td>6</td>\n",
|
|
" <td>Rwanda</td>\n",
|
|
" <td>12374397</td>\n",
|
|
" <td>26338</td>\n",
|
|
" <td>470</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>6</th>\n",
|
|
" <td>7</td>\n",
|
|
" <td>Haiti</td>\n",
|
|
" <td>11577779</td>\n",
|
|
" <td>27065</td>\n",
|
|
" <td>428</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>7</th>\n",
|
|
" <td>8</td>\n",
|
|
" <td>Netherlands</td>\n",
|
|
" <td>17480000</td>\n",
|
|
" <td>41526</td>\n",
|
|
" <td>421</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>8</th>\n",
|
|
" <td>9</td>\n",
|
|
" <td>Israel</td>\n",
|
|
" <td>9220000</td>\n",
|
|
" <td>22072</td>\n",
|
|
" <td>418</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>9</th>\n",
|
|
" <td>10</td>\n",
|
|
" <td>India</td>\n",
|
|
" <td>1364080000</td>\n",
|
|
" <td>3287240</td>\n",
|
|
" <td>415</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" Rank Country Population Area (km2) Density (Pop. per km2)\n",
|
|
"0 1 Singapore 5703600 710 8033\n",
|
|
"1 2 Bangladesh 168870000 143998 1173\n",
|
|
"2 3 Lebanon 6855713 10452 656\n",
|
|
"3 4 Taiwan 23604265 36193 652\n",
|
|
"4 5 South Korea 51780579 99538 520\n",
|
|
"5 6 Rwanda 12374397 26338 470\n",
|
|
"6 7 Haiti 11577779 27065 428\n",
|
|
"7 8 Netherlands 17480000 41526 421\n",
|
|
"8 9 Israel 9220000 22072 418\n",
|
|
"9 10 India 1364080000 3287240 415"
|
|
]
|
|
},
|
|
"execution_count": 72,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"tables[6]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Write to html Output\n",
|
|
"\n",
|
|
"If you are working on a website and want to quickly output the .html file, you can use to_html"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 73,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"df.to_html('simple.html',index=False)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"**read_html** is not perfect, but its quite powerful for such a simple method call!"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Excel Files\n",
|
|
"\n",
|
|
"Pandas can read in basic excel files (it will get errors if there are macros or extensive formulas relying on outside excel files), in general, pandas can only grab the raw information from an .excel file.\n",
|
|
"\n",
|
|
"#### NOTE: Requires the openpyxl and xlrd library! Its provided for you in our environment, or simply install with:\n",
|
|
"\n",
|
|
" pip install openpyxl\n",
|
|
" pip install xlrd\n",
|
|
" \n",
|
|
"Heavy excel users may want to check out this website: https://www.python-excel.org/\n",
|
|
"\n",
|
|
"You can think of an excel file as a Workbook containin sheets, which for pandas means each sheet can be a DataFrame.\n",
|
|
"\n",
|
|
"## Excel file input with read_excel()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 74,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"df = pd.read_excel('my_excel_file.xlsx',sheet_name='First_Sheet')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 75,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>a</th>\n",
|
|
" <th>b</th>\n",
|
|
" <th>c</th>\n",
|
|
" <th>d</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>0</td>\n",
|
|
" <td>1</td>\n",
|
|
" <td>2</td>\n",
|
|
" <td>3</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>4</td>\n",
|
|
" <td>5</td>\n",
|
|
" <td>6</td>\n",
|
|
" <td>7</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>8</td>\n",
|
|
" <td>9</td>\n",
|
|
" <td>10</td>\n",
|
|
" <td>11</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td>12</td>\n",
|
|
" <td>13</td>\n",
|
|
" <td>14</td>\n",
|
|
" <td>15</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" a b c d\n",
|
|
"0 0 1 2 3\n",
|
|
"1 4 5 6 7\n",
|
|
"2 8 9 10 11\n",
|
|
"3 12 13 14 15"
|
|
]
|
|
},
|
|
"execution_count": 75,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"df"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### What if you don't know the sheet name? Or want to run a for loop for certain sheet names? Or want every sheet?\n",
|
|
"\n",
|
|
"Several ways to do this: https://stackoverflow.com/questions/17977540/pandas-looking-up-the-list-of-sheets-in-an-excel-file"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 76,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"['First_Sheet']"
|
|
]
|
|
},
|
|
"execution_count": 76,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# Returns a list of sheet_names\n",
|
|
"pd.ExcelFile('my_excel_file.xlsx').sheet_names"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Grab all sheets"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 77,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"excel_sheets = pd.read_excel('my_excel_file.xlsx',sheet_name=None)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 78,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"dict"
|
|
]
|
|
},
|
|
"execution_count": 78,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"type(excel_sheets)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 79,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"dict_keys(['First_Sheet'])"
|
|
]
|
|
},
|
|
"execution_count": 79,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"excel_sheets.keys()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 80,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>a</th>\n",
|
|
" <th>b</th>\n",
|
|
" <th>c</th>\n",
|
|
" <th>d</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>0</td>\n",
|
|
" <td>1</td>\n",
|
|
" <td>2</td>\n",
|
|
" <td>3</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>4</td>\n",
|
|
" <td>5</td>\n",
|
|
" <td>6</td>\n",
|
|
" <td>7</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>8</td>\n",
|
|
" <td>9</td>\n",
|
|
" <td>10</td>\n",
|
|
" <td>11</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td>12</td>\n",
|
|
" <td>13</td>\n",
|
|
" <td>14</td>\n",
|
|
" <td>15</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" a b c d\n",
|
|
"0 0 1 2 3\n",
|
|
"1 4 5 6 7\n",
|
|
"2 8 9 10 11\n",
|
|
"3 12 13 14 15"
|
|
]
|
|
},
|
|
"execution_count": 80,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"excel_sheets['First_Sheet']"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Write to Excel File"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 81,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"df.to_excel('example.xlsx',sheet_name='First_Sheet',index=False)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# SQL Connections\n",
|
|
"\n",
|
|
"#### NOTE: Highly recommend you explore specific libraries for your specific SQL Engine. Simple search for your database+python in Google and the top results should hopefully include an API.\n",
|
|
"\n",
|
|
"* [MySQL](https://www.google.com/search?q=mysql+python)\n",
|
|
"* [PostgreSQL](https://www.google.com/search?q=postgresql+python)\n",
|
|
"* [MS SQL Server](https://www.google.com/search?q=MSSQLserver+python)\n",
|
|
"* [Orcale](https://www.google.com/search?q=oracle+python)\n",
|
|
"* [MongoDB](https://www.google.com/search?q=mongodb+python)\n",
|
|
"\n",
|
|
"Let's review pandas capabilities by using SQLite, which comes built in with Python.\n",
|
|
"\n",
|
|
"## Example SQL Database (temporary in your RAM)\n",
|
|
"\n",
|
|
"You will need to install sqlalchemy with:\n",
|
|
"\n",
|
|
" pip install sqlalchemy\n",
|
|
" \n",
|
|
"to follow along. To understand how to make a connection to your own database, make sure to review: https://docs.sqlalchemy.org/en/13/core/connections.html"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 82,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from sqlalchemy import create_engine"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 83,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"temp_db = create_engine('sqlite:///:memory:')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Write to Database"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 85,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>Rank</th>\n",
|
|
" <th>Country</th>\n",
|
|
" <th>Population</th>\n",
|
|
" <th>Area (km2)</th>\n",
|
|
" <th>Density (Pop. per km2)</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>1</td>\n",
|
|
" <td>Singapore</td>\n",
|
|
" <td>5703600</td>\n",
|
|
" <td>710</td>\n",
|
|
" <td>8033</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>2</td>\n",
|
|
" <td>Bangladesh</td>\n",
|
|
" <td>168870000</td>\n",
|
|
" <td>143998</td>\n",
|
|
" <td>1173</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>3</td>\n",
|
|
" <td>Lebanon</td>\n",
|
|
" <td>6855713</td>\n",
|
|
" <td>10452</td>\n",
|
|
" <td>656</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td>4</td>\n",
|
|
" <td>Taiwan</td>\n",
|
|
" <td>23604265</td>\n",
|
|
" <td>36193</td>\n",
|
|
" <td>652</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td>5</td>\n",
|
|
" <td>South Korea</td>\n",
|
|
" <td>51780579</td>\n",
|
|
" <td>99538</td>\n",
|
|
" <td>520</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>5</th>\n",
|
|
" <td>6</td>\n",
|
|
" <td>Rwanda</td>\n",
|
|
" <td>12374397</td>\n",
|
|
" <td>26338</td>\n",
|
|
" <td>470</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>6</th>\n",
|
|
" <td>7</td>\n",
|
|
" <td>Haiti</td>\n",
|
|
" <td>11577779</td>\n",
|
|
" <td>27065</td>\n",
|
|
" <td>428</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>7</th>\n",
|
|
" <td>8</td>\n",
|
|
" <td>Netherlands</td>\n",
|
|
" <td>17480000</td>\n",
|
|
" <td>41526</td>\n",
|
|
" <td>421</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>8</th>\n",
|
|
" <td>9</td>\n",
|
|
" <td>Israel</td>\n",
|
|
" <td>9220000</td>\n",
|
|
" <td>22072</td>\n",
|
|
" <td>418</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>9</th>\n",
|
|
" <td>10</td>\n",
|
|
" <td>India</td>\n",
|
|
" <td>1364080000</td>\n",
|
|
" <td>3287240</td>\n",
|
|
" <td>415</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" Rank Country Population Area (km2) Density (Pop. per km2)\n",
|
|
"0 1 Singapore 5703600 710 8033\n",
|
|
"1 2 Bangladesh 168870000 143998 1173\n",
|
|
"2 3 Lebanon 6855713 10452 656\n",
|
|
"3 4 Taiwan 23604265 36193 652\n",
|
|
"4 5 South Korea 51780579 99538 520\n",
|
|
"5 6 Rwanda 12374397 26338 470\n",
|
|
"6 7 Haiti 11577779 27065 428\n",
|
|
"7 8 Netherlands 17480000 41526 421\n",
|
|
"8 9 Israel 9220000 22072 418\n",
|
|
"9 10 India 1364080000 3287240 415"
|
|
]
|
|
},
|
|
"execution_count": 85,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"tables[6]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 86,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"pop = tables[6]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 87,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"pop.to_sql(name='populations',con=temp_db)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Read from SQL Database"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 89,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>index</th>\n",
|
|
" <th>Rank</th>\n",
|
|
" <th>Country</th>\n",
|
|
" <th>Population</th>\n",
|
|
" <th>Area (km2)</th>\n",
|
|
" <th>Density (Pop. per km2)</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>0</td>\n",
|
|
" <td>1</td>\n",
|
|
" <td>Singapore</td>\n",
|
|
" <td>5703600</td>\n",
|
|
" <td>710</td>\n",
|
|
" <td>8033</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>1</td>\n",
|
|
" <td>2</td>\n",
|
|
" <td>Bangladesh</td>\n",
|
|
" <td>168870000</td>\n",
|
|
" <td>143998</td>\n",
|
|
" <td>1173</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>2</td>\n",
|
|
" <td>3</td>\n",
|
|
" <td>Lebanon</td>\n",
|
|
" <td>6855713</td>\n",
|
|
" <td>10452</td>\n",
|
|
" <td>656</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td>3</td>\n",
|
|
" <td>4</td>\n",
|
|
" <td>Taiwan</td>\n",
|
|
" <td>23604265</td>\n",
|
|
" <td>36193</td>\n",
|
|
" <td>652</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td>4</td>\n",
|
|
" <td>5</td>\n",
|
|
" <td>South Korea</td>\n",
|
|
" <td>51780579</td>\n",
|
|
" <td>99538</td>\n",
|
|
" <td>520</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>5</th>\n",
|
|
" <td>5</td>\n",
|
|
" <td>6</td>\n",
|
|
" <td>Rwanda</td>\n",
|
|
" <td>12374397</td>\n",
|
|
" <td>26338</td>\n",
|
|
" <td>470</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>6</th>\n",
|
|
" <td>6</td>\n",
|
|
" <td>7</td>\n",
|
|
" <td>Haiti</td>\n",
|
|
" <td>11577779</td>\n",
|
|
" <td>27065</td>\n",
|
|
" <td>428</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>7</th>\n",
|
|
" <td>7</td>\n",
|
|
" <td>8</td>\n",
|
|
" <td>Netherlands</td>\n",
|
|
" <td>17480000</td>\n",
|
|
" <td>41526</td>\n",
|
|
" <td>421</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>8</th>\n",
|
|
" <td>8</td>\n",
|
|
" <td>9</td>\n",
|
|
" <td>Israel</td>\n",
|
|
" <td>9220000</td>\n",
|
|
" <td>22072</td>\n",
|
|
" <td>418</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>9</th>\n",
|
|
" <td>9</td>\n",
|
|
" <td>10</td>\n",
|
|
" <td>India</td>\n",
|
|
" <td>1364080000</td>\n",
|
|
" <td>3287240</td>\n",
|
|
" <td>415</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" index Rank Country Population Area (km2) Density (Pop. per km2)\n",
|
|
"0 0 1 Singapore 5703600 710 8033\n",
|
|
"1 1 2 Bangladesh 168870000 143998 1173\n",
|
|
"2 2 3 Lebanon 6855713 10452 656\n",
|
|
"3 3 4 Taiwan 23604265 36193 652\n",
|
|
"4 4 5 South Korea 51780579 99538 520\n",
|
|
"5 5 6 Rwanda 12374397 26338 470\n",
|
|
"6 6 7 Haiti 11577779 27065 428\n",
|
|
"7 7 8 Netherlands 17480000 41526 421\n",
|
|
"8 8 9 Israel 9220000 22072 418\n",
|
|
"9 9 10 India 1364080000 3287240 415"
|
|
]
|
|
},
|
|
"execution_count": 89,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# Read in an entire table\n",
|
|
"pd.read_sql(sql='populations',con=temp_db)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 92,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>Country</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>Singapore</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>Bangladesh</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>Lebanon</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td>Taiwan</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td>South Korea</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>5</th>\n",
|
|
" <td>Rwanda</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>6</th>\n",
|
|
" <td>Haiti</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>7</th>\n",
|
|
" <td>Netherlands</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>8</th>\n",
|
|
" <td>Israel</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>9</th>\n",
|
|
" <td>India</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" Country\n",
|
|
"0 Singapore\n",
|
|
"1 Bangladesh\n",
|
|
"2 Lebanon\n",
|
|
"3 Taiwan\n",
|
|
"4 South Korea\n",
|
|
"5 Rwanda\n",
|
|
"6 Haiti\n",
|
|
"7 Netherlands\n",
|
|
"8 Israel\n",
|
|
"9 India"
|
|
]
|
|
},
|
|
"execution_count": 92,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# Read in with a SQL Query\n",
|
|
"pd.read_sql_query(sql=\"SELECT Country FROM populations\",con=temp_db)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"It is difficult to generalize pandas and SQL, due to a wide array of issues, including permissions,security, online access, varying SQL engines, etc... Use these ideas as a starting off point, and you will most likely need to do your own research for your own situation."
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"anaconda-cloud": {},
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.7.6"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 1
|
|
}
|