{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "___\n",
    "\n",
    "<a href='http://www.pieriandata.com'><img src='../Pierian_Data_Logo.png'/></a>\n",
    "___\n",
    "<center><em>Copyright by Pierian Data Inc.</em></center>\n",
    "<center><em>For more information, visit us at <a href='http://www.pieriandata.com'>www.pieriandata.com</a></em></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Inputs and Outputs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\"><strong>NOTE:</strong> Typically we will just be either reading csv files directly or using pandas-datareader to pull data from the web. Consider this lecture just a quick overview of what is possible with pandas (we won't be working with SQL or Excel files in this course)</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Data Input and Output\n",
    "\n",
    "This notebook is the reference code for getting input and output, pandas can read a variety of file types using its pd.read_ methods. Let's take a look at the most common data types:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Check out the references here! \n",
    "\n",
    "**This is the best online resource for how to read/write to a variety of data sources!**\n",
    "\n",
    "https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html\n",
    "\n",
    "----\n",
    "----"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<table border=\"1\" class=\"colwidths-given docutils\">\n",
    "<colgroup>\n",
    "<col width=\"12%\" />\n",
    "<col width=\"40%\" />\n",
    "<col width=\"24%\" />\n",
    "<col width=\"24%\" />\n",
    "</colgroup>\n",
    "<thead valign=\"bottom\">\n",
    "<tr class=\"row-odd\"><th class=\"head\">Format Type</th>\n",
    "<th class=\"head\">Data Description</th>\n",
    "<th class=\"head\">Reader</th>\n",
    "<th class=\"head\">Writer</th>\n",
    "</tr>\n",
    "</thead>\n",
    "<tbody valign=\"top\">\n",
    "<tr class=\"row-even\"><td>text</td>\n",
    "<td><a class=\"reference external\" href=\"https://en.wikipedia.org/wiki/Comma-separated_values\">CSV</a></td>\n",
    "<td><a class=\"reference internal\" href=\"#io-read-csv-table\"><span class=\"std std-ref\">read_csv</span></a></td>\n",
    "<td><a class=\"reference internal\" href=\"#io-store-in-csv\"><span class=\"std std-ref\">to_csv</span></a></td>\n",
    "</tr>\n",
    "<tr class=\"row-odd\"><td>text</td>\n",
    "<td><a class=\"reference external\" href=\"https://www.json.org/\">JSON</a></td>\n",
    "<td><a class=\"reference internal\" href=\"#io-json-reader\"><span class=\"std std-ref\">read_json</span></a></td>\n",
    "<td><a class=\"reference internal\" href=\"#io-json-writer\"><span class=\"std std-ref\">to_json</span></a></td>\n",
    "</tr>\n",
    "<tr class=\"row-even\"><td>text</td>\n",
    "<td><a class=\"reference external\" href=\"https://en.wikipedia.org/wiki/HTML\">HTML</a></td>\n",
    "<td><a class=\"reference internal\" href=\"#io-read-html\"><span class=\"std std-ref\">read_html</span></a></td>\n",
    "<td><a class=\"reference internal\" href=\"#io-html\"><span class=\"std std-ref\">to_html</span></a></td>\n",
    "</tr>\n",
    "<tr class=\"row-odd\"><td>text</td>\n",
    "<td>Local clipboard</td>\n",
    "<td><a class=\"reference internal\" href=\"#io-clipboard\"><span class=\"std std-ref\">read_clipboard</span></a></td>\n",
    "<td><a class=\"reference internal\" href=\"#io-clipboard\"><span class=\"std std-ref\">to_clipboard</span></a></td>\n",
    "</tr>\n",
    "<tr class=\"row-even\"><td>binary</td>\n",
    "<td><a class=\"reference external\" href=\"https://en.wikipedia.org/wiki/Microsoft_Excel\">MS Excel</a></td>\n",
    "<td><a class=\"reference internal\" href=\"#io-excel-reader\"><span class=\"std std-ref\">read_excel</span></a></td>\n",
    "<td><a class=\"reference internal\" href=\"#io-excel-writer\"><span class=\"std std-ref\">to_excel</span></a></td>\n",
    "</tr>\n",
    "<tr class=\"row-odd\"><td>binary</td>\n",
    "<td><a class=\"reference external\" href=\"http://www.opendocumentformat.org\">OpenDocument</a></td>\n",
    "<td><a class=\"reference internal\" href=\"#io-ods\"><span class=\"std std-ref\">read_excel</span></a></td>\n",
    "<td>&#160;</td>\n",
    "</tr>\n",
    "<tr class=\"row-even\"><td>binary</td>\n",
    "<td><a class=\"reference external\" href=\"https://support.hdfgroup.org/HDF5/whatishdf5.html\">HDF5 Format</a></td>\n",
    "<td><a class=\"reference internal\" href=\"#io-hdf5\"><span class=\"std std-ref\">read_hdf</span></a></td>\n",
    "<td><a class=\"reference internal\" href=\"#io-hdf5\"><span class=\"std std-ref\">to_hdf</span></a></td>\n",
    "</tr>\n",
    "<tr class=\"row-odd\"><td>binary</td>\n",
    "<td><a class=\"reference external\" href=\"https://github.com/wesm/feather\">Feather Format</a></td>\n",
    "<td><a class=\"reference internal\" href=\"#io-feather\"><span class=\"std std-ref\">read_feather</span></a></td>\n",
    "<td><a class=\"reference internal\" href=\"#io-feather\"><span class=\"std std-ref\">to_feather</span></a></td>\n",
    "</tr>\n",
    "<tr class=\"row-even\"><td>binary</td>\n",
    "<td><a class=\"reference external\" href=\"https://parquet.apache.org/\">Parquet Format</a></td>\n",
    "<td><a class=\"reference internal\" href=\"#io-parquet\"><span class=\"std std-ref\">read_parquet</span></a></td>\n",
    "<td><a class=\"reference internal\" href=\"#io-parquet\"><span class=\"std std-ref\">to_parquet</span></a></td>\n",
    "</tr>\n",
    "<tr class=\"row-odd\"><td>binary</td>\n",
    "<td><a class=\"reference external\" href=\"https://msgpack.org/index.html\">Msgpack</a></td>\n",
    "<td><a class=\"reference internal\" href=\"#io-msgpack\"><span class=\"std std-ref\">read_msgpack</span></a></td>\n",
    "<td><a class=\"reference internal\" href=\"#io-msgpack\"><span class=\"std std-ref\">to_msgpack</span></a></td>\n",
    "</tr>\n",
    "<tr class=\"row-even\"><td>binary</td>\n",
    "<td><a class=\"reference external\" href=\"https://en.wikipedia.org/wiki/Stata\">Stata</a></td>\n",
    "<td><a class=\"reference internal\" href=\"#io-stata-reader\"><span class=\"std std-ref\">read_stata</span></a></td>\n",
    "<td><a class=\"reference internal\" href=\"#io-stata-writer\"><span class=\"std std-ref\">to_stata</span></a></td>\n",
    "</tr>\n",
    "<tr class=\"row-odd\"><td>binary</td>\n",
    "<td><a class=\"reference external\" href=\"https://en.wikipedia.org/wiki/SAS_(software)\">SAS</a></td>\n",
    "<td><a class=\"reference internal\" href=\"#io-sas-reader\"><span class=\"std std-ref\">read_sas</span></a></td>\n",
    "<td>&#160;</td>\n",
    "</tr>\n",
    "<tr class=\"row-even\"><td>binary</td>\n",
    "<td><a class=\"reference external\" href=\"https://docs.python.org/3/library/pickle.html\">Python Pickle Format</a></td>\n",
    "<td><a class=\"reference internal\" href=\"#io-pickle\"><span class=\"std std-ref\">read_pickle</span></a></td>\n",
    "<td><a class=\"reference internal\" href=\"#io-pickle\"><span class=\"std std-ref\">to_pickle</span></a></td>\n",
    "</tr>\n",
    "<tr class=\"row-odd\"><td>SQL</td>\n",
    "<td><a class=\"reference external\" href=\"https://en.wikipedia.org/wiki/SQL\">SQL</a></td>\n",
    "<td><a class=\"reference internal\" href=\"#io-sql\"><span class=\"std std-ref\">read_sql</span></a></td>\n",
    "<td><a class=\"reference internal\" href=\"#io-sql\"><span class=\"std std-ref\">to_sql</span></a></td>\n",
    "</tr>\n",
    "<tr class=\"row-even\"><td>SQL</td>\n",
    "<td><a class=\"reference external\" href=\"https://en.wikipedia.org/wiki/BigQuery\">Google Big Query</a></td>\n",
    "<td><a class=\"reference internal\" href=\"#io-bigquery\"><span class=\"std std-ref\">read_gbq</span></a></td>\n",
    "<td><a class=\"reference internal\" href=\"#io-bigquery\"><span class=\"std std-ref\">to_gbq</span></a></td>\n",
    "</tr>\n",
    "</tbody>\n",
    "</table>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "-----\n",
    "----"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Reading in a  CSV\n",
    "Comma Separated Values files are text files that use commas as field delimeters.<br>\n",
    "Unless you're running the virtual environment included with the course, you may need to install <tt>xlrd</tt> and <tt>openpyxl</tt>.<br>\n",
    "In your terminal/command prompt run:\n",
    "\n",
    "    conda install xlrd\n",
    "    conda install openpyxl\n",
    "\n",
    "Then restart Jupyter Notebook.\n",
    "(or use pip install if you aren't using the Anaconda Distribution)\n",
    "\n",
    "## Understanding File Paths\n",
    "\n",
    "You have two options when reading a file with pandas:\n",
    "\n",
    "1. If your .py file or .ipynb notebook is located in the **exact** same folder location as the .csv file you want to read, simply pass in the file name as a string, for example:\n",
    "    \n",
    "        df = pd.read_csv('some_file.csv')\n",
    "        \n",
    "2. Pass in the entire file path if you are located in a different directory. The file path must be 100% correct in order for this to work. For example:\n",
    "\n",
    "        df = pd.read_csv(\"C:\\\\Users\\\\myself\\\\files\\\\some_file.csv\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Print your current directory file path with pwd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'C:\\\\Users\\\\Marcial\\\\Pierian-Data-Courses\\\\Machine-Learning-MasterClass\\\\03-Pandas'"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pwd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### List the files in your current directory with ls"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " Volume in drive C has no label.\n",
      " Volume Serial Number is 3652-BD2F\n",
      "\n",
      " Directory of C:\\Users\\Marcial\\Pierian-Data-Courses\\Machine-Learning-MasterClass\\03-Pandas\n",
      "\n",
      "07/04/2020  06:10 PM    <DIR>          .\n",
      "07/04/2020  06:10 PM    <DIR>          ..\n",
      "07/02/2020  05:40 PM    <DIR>          .ipynb_checkpoints\n",
      "06/30/2020  04:51 PM           565,390 00-Series.ipynb\n",
      "07/01/2020  12:48 PM           208,957 01-DataFrames.ipynb\n",
      "07/01/2020  12:48 PM           194,591 02-Conditional-Filtering.ipynb\n",
      "07/02/2020  07:02 PM           196,047 03-Useful-Methods.ipynb\n",
      "07/01/2020  03:32 PM            64,227 04-Missing-Data.ipynb\n",
      "07/04/2020  01:28 PM           219,627 05-Groupby-Operations-and-MultiIndex.ipynb\n",
      "07/04/2020  03:19 PM            62,966 06-Combining-DataFrames.ipynb\n",
      "07/02/2020  07:02 PM            29,356 07-Text-Methods.ipynb\n",
      "07/02/2020  06:38 PM            35,705 08-Time-Methods.ipynb\n",
      "07/04/2020  06:10 PM            53,097 09-Inputs-and-Outputs.ipynb\n",
      "07/02/2020  05:34 PM             1,095 10-Pivot-Tables.ipynb\n",
      "07/02/2020  05:34 PM               951 11-Pandas-Project-Exercise.ipynb\n",
      "07/02/2020  05:34 PM             1,118 12-Pandas-Project-Exercise-Solution.ipynb\n",
      "07/04/2020  05:39 PM                51 example.csv\n",
      "07/04/2020  06:02 PM             5,022 example.xlsx\n",
      "02/07/2020  12:26 PM               177 movie_scores.csv\n",
      "07/01/2020  03:56 PM            17,727 mpg.csv\n",
      "07/04/2020  05:58 PM             5,022 my_excel_file.xlsx\n",
      "07/04/2020  05:56 PM                51 new_file.csv\n",
      "07/02/2020  05:56 PM             5,459 RetailSales_BeerWineLiquor.csv\n",
      "07/04/2020  05:56 PM               555 simple.html\n",
      "01/27/2020  02:28 PM            18,752 tips.csv\n",
      "              22 File(s)      1,685,943 bytes\n",
      "               3 Dir(s)  82,818,367,488 bytes free\n"
     ]
    }
   ],
   "source": [
    "ls"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "-----\n",
    "#### NOTE! Common confusion point! Take note that all read input methods are called directly from pandas with pd.read_  , all output methods are called directly off the dataframe with df.to_\n",
    "\n",
    "-------"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### CSV Input"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_csv('example.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "      <th>d</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "      <td>10</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>12</td>\n",
       "      <td>13</td>\n",
       "      <td>14</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    a   b   c   d\n",
       "0   0   1   2   3\n",
       "1   4   5   6   7\n",
       "2   8   9  10  11\n",
       "3  12  13  14  15"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_csv('example.csv',index_col=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "      <th>d</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>9</td>\n",
       "      <td>10</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>13</td>\n",
       "      <td>14</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     b   c   d\n",
       "a             \n",
       "0    1   2   3\n",
       "4    5   6   7\n",
       "8    9  10  11\n",
       "12  13  14  15"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_csv('example.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "      <th>d</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "      <td>10</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>12</td>\n",
       "      <td>13</td>\n",
       "      <td>14</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    a   b   c   d\n",
       "0   0   1   2   3\n",
       "1   4   5   6   7\n",
       "2   8   9  10  11\n",
       "3  12  13  14  15"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### CSV Output\n",
    "\n",
    "Set index=False if you do not want to save the index , otherwise it will add a new column to the .csv file that includes your index and call it \"Unnamed: 0\" if your index did not have a name. If you do want to save your index, simply set it to True (the default value)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.to_csv('new_file.csv',index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## HTML\n",
    "\n",
    "Pandas can read table tabs off of HTML. This only works if your firewall isn't blocking pandas from accessing the internet!\n",
    "\n",
    "Unless you're running the virtual environment included with the course, you may need to install <tt>lxml</tt>, <tt>htmllib5</tt>, and <tt>BeautifulSoup4</tt>.<br>\n",
    "In your terminal/command prompt run:\n",
    "\n",
    "    conda install lxml\n",
    "    \n",
    "    or\n",
    "    \n",
    "    pip install lxml\n",
    "    \n",
    "Then restart Jupyter Notebook (you may need to restart your computer).\n",
    "(or use pip install if you aren't using the Anaconda Distribution)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## read_html\n",
    "\n",
    "### HTML Input\n",
    "\n",
    "Pandas read_html function will read tables off of a webpage and return a list of DataFrame objects. NOTE: This only works with well defined <table> objects in the html on the page, this can not magically read in tables that are images on a page."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [],
   "source": [
    "tables = pd.read_html('https://en.wikipedia.org/wiki/World_population')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "26"
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(tables) #tables"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "### Not Useful Tables\n",
    "Pandas found 26 tables on that page. Some are not useful:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>NaN</td>\n",
       "      <td>An editor has expressed concern that this arti...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    0                                                  1\n",
       "0 NaN  An editor has expressed concern that this arti..."
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tables[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Tables that need formatting"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Some will be misaligned, meaning you need to do extra work to fix the columns and rows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"5\" halign=\"left\">World population (millions, UN estimates)[14]</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>#</th>\n",
       "      <th>Top ten most populous countries</th>\n",
       "      <th>2000</th>\n",
       "      <th>2015</th>\n",
       "      <th>2030[A]</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>China[B]</td>\n",
       "      <td>1270</td>\n",
       "      <td>1376</td>\n",
       "      <td>1416</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>India</td>\n",
       "      <td>1053</td>\n",
       "      <td>1311</td>\n",
       "      <td>1528</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>United States</td>\n",
       "      <td>283</td>\n",
       "      <td>322</td>\n",
       "      <td>356</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>Indonesia</td>\n",
       "      <td>212</td>\n",
       "      <td>258</td>\n",
       "      <td>295</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>Pakistan</td>\n",
       "      <td>136</td>\n",
       "      <td>208</td>\n",
       "      <td>245</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>6</td>\n",
       "      <td>Brazil</td>\n",
       "      <td>176</td>\n",
       "      <td>206</td>\n",
       "      <td>228</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>7</td>\n",
       "      <td>Nigeria</td>\n",
       "      <td>123</td>\n",
       "      <td>182</td>\n",
       "      <td>263</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>8</td>\n",
       "      <td>Bangladesh</td>\n",
       "      <td>131</td>\n",
       "      <td>161</td>\n",
       "      <td>186</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>9</td>\n",
       "      <td>Russia</td>\n",
       "      <td>146</td>\n",
       "      <td>146</td>\n",
       "      <td>149</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>10</td>\n",
       "      <td>Mexico</td>\n",
       "      <td>103</td>\n",
       "      <td>127</td>\n",
       "      <td>148</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>NaN</td>\n",
       "      <td>World total</td>\n",
       "      <td>6127</td>\n",
       "      <td>7349</td>\n",
       "      <td>8501</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Notes: ^ 2030 = Medium variant. ^ China exclud...</td>\n",
       "      <td>Notes: ^ 2030 = Medium variant. ^ China exclud...</td>\n",
       "      <td>Notes: ^ 2030 = Medium variant. ^ China exclud...</td>\n",
       "      <td>Notes: ^ 2030 = Medium variant. ^ China exclud...</td>\n",
       "      <td>Notes: ^ 2030 = Medium variant. ^ China exclud...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        World population (millions, UN estimates)[14]  \\\n",
       "                                                    #   \n",
       "0                                                   1   \n",
       "1                                                   2   \n",
       "2                                                   3   \n",
       "3                                                   4   \n",
       "4                                                   5   \n",
       "5                                                   6   \n",
       "6                                                   7   \n",
       "7                                                   8   \n",
       "8                                                   9   \n",
       "9                                                  10   \n",
       "10                                                NaN   \n",
       "11  Notes: ^ 2030 = Medium variant. ^ China exclud...   \n",
       "\n",
       "                                                       \\\n",
       "                      Top ten most populous countries   \n",
       "0                                            China[B]   \n",
       "1                                               India   \n",
       "2                                       United States   \n",
       "3                                           Indonesia   \n",
       "4                                            Pakistan   \n",
       "5                                              Brazil   \n",
       "6                                             Nigeria   \n",
       "7                                          Bangladesh   \n",
       "8                                              Russia   \n",
       "9                                              Mexico   \n",
       "10                                        World total   \n",
       "11  Notes: ^ 2030 = Medium variant. ^ China exclud...   \n",
       "\n",
       "                                                       \\\n",
       "                                                 2000   \n",
       "0                                                1270   \n",
       "1                                                1053   \n",
       "2                                                 283   \n",
       "3                                                 212   \n",
       "4                                                 136   \n",
       "5                                                 176   \n",
       "6                                                 123   \n",
       "7                                                 131   \n",
       "8                                                 146   \n",
       "9                                                 103   \n",
       "10                                               6127   \n",
       "11  Notes: ^ 2030 = Medium variant. ^ China exclud...   \n",
       "\n",
       "                                                       \\\n",
       "                                                 2015   \n",
       "0                                                1376   \n",
       "1                                                1311   \n",
       "2                                                 322   \n",
       "3                                                 258   \n",
       "4                                                 208   \n",
       "5                                                 206   \n",
       "6                                                 182   \n",
       "7                                                 161   \n",
       "8                                                 146   \n",
       "9                                                 127   \n",
       "10                                               7349   \n",
       "11  Notes: ^ 2030 = Medium variant. ^ China exclud...   \n",
       "\n",
       "                                                       \n",
       "                                              2030[A]  \n",
       "0                                                1416  \n",
       "1                                                1528  \n",
       "2                                                 356  \n",
       "3                                                 295  \n",
       "4                                                 245  \n",
       "5                                                 228  \n",
       "6                                                 263  \n",
       "7                                                 186  \n",
       "8                                                 149  \n",
       "9                                                 148  \n",
       "10                                               8501  \n",
       "11  Notes: ^ 2030 = Medium variant. ^ China exclud...  "
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tables[1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [],
   "source": [
    "world_pop = tables[1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "MultiIndex([('World population (millions, UN estimates)[14]', ...),\n",
       "            ('World population (millions, UN estimates)[14]', ...),\n",
       "            ('World population (millions, UN estimates)[14]', ...),\n",
       "            ('World population (millions, UN estimates)[14]', ...),\n",
       "            ('World population (millions, UN estimates)[14]', ...)],\n",
       "           )"
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "world_pop.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [],
   "source": [
    "world_pop = world_pop['World population (millions, UN estimates)[14]'].drop('#',axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['Top ten most populous countries', '2000', '2015', '2030[A]'], dtype='object')"
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "world_pop.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [],
   "source": [
    "world_pop.columns = ['Countries', '2000', '2015', '2030 Est.']\n",
    "world_pop = world_pop.drop(11,axis=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Countries</th>\n",
       "      <th>2000</th>\n",
       "      <th>2015</th>\n",
       "      <th>2030 Est.</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>China[B]</td>\n",
       "      <td>1270</td>\n",
       "      <td>1376</td>\n",
       "      <td>1416</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>India</td>\n",
       "      <td>1053</td>\n",
       "      <td>1311</td>\n",
       "      <td>1528</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>United States</td>\n",
       "      <td>283</td>\n",
       "      <td>322</td>\n",
       "      <td>356</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Indonesia</td>\n",
       "      <td>212</td>\n",
       "      <td>258</td>\n",
       "      <td>295</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Pakistan</td>\n",
       "      <td>136</td>\n",
       "      <td>208</td>\n",
       "      <td>245</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Brazil</td>\n",
       "      <td>176</td>\n",
       "      <td>206</td>\n",
       "      <td>228</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Nigeria</td>\n",
       "      <td>123</td>\n",
       "      <td>182</td>\n",
       "      <td>263</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Bangladesh</td>\n",
       "      <td>131</td>\n",
       "      <td>161</td>\n",
       "      <td>186</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Russia</td>\n",
       "      <td>146</td>\n",
       "      <td>146</td>\n",
       "      <td>149</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Mexico</td>\n",
       "      <td>103</td>\n",
       "      <td>127</td>\n",
       "      <td>148</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>World total</td>\n",
       "      <td>6127</td>\n",
       "      <td>7349</td>\n",
       "      <td>8501</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        Countries  2000  2015 2030 Est.\n",
       "0        China[B]  1270  1376      1416\n",
       "1           India  1053  1311      1528\n",
       "2   United States   283   322       356\n",
       "3       Indonesia   212   258       295\n",
       "4        Pakistan   136   208       245\n",
       "5          Brazil   176   206       228\n",
       "6         Nigeria   123   182       263\n",
       "7      Bangladesh   131   161       186\n",
       "8          Russia   146   146       149\n",
       "9          Mexico   103   127       148\n",
       "10    World total  6127  7349      8501"
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "world_pop"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Tables that are intact"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Rank</th>\n",
       "      <th>Country</th>\n",
       "      <th>Population</th>\n",
       "      <th>Area (km2)</th>\n",
       "      <th>Density (Pop. per km2)</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>Singapore</td>\n",
       "      <td>5703600</td>\n",
       "      <td>710</td>\n",
       "      <td>8033</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>Bangladesh</td>\n",
       "      <td>168870000</td>\n",
       "      <td>143998</td>\n",
       "      <td>1173</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>Lebanon</td>\n",
       "      <td>6855713</td>\n",
       "      <td>10452</td>\n",
       "      <td>656</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>Taiwan</td>\n",
       "      <td>23604265</td>\n",
       "      <td>36193</td>\n",
       "      <td>652</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>South Korea</td>\n",
       "      <td>51780579</td>\n",
       "      <td>99538</td>\n",
       "      <td>520</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>6</td>\n",
       "      <td>Rwanda</td>\n",
       "      <td>12374397</td>\n",
       "      <td>26338</td>\n",
       "      <td>470</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>7</td>\n",
       "      <td>Haiti</td>\n",
       "      <td>11577779</td>\n",
       "      <td>27065</td>\n",
       "      <td>428</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>8</td>\n",
       "      <td>Netherlands</td>\n",
       "      <td>17480000</td>\n",
       "      <td>41526</td>\n",
       "      <td>421</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>9</td>\n",
       "      <td>Israel</td>\n",
       "      <td>9220000</td>\n",
       "      <td>22072</td>\n",
       "      <td>418</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>10</td>\n",
       "      <td>India</td>\n",
       "      <td>1364080000</td>\n",
       "      <td>3287240</td>\n",
       "      <td>415</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Rank      Country  Population  Area (km2)  Density (Pop. per km2)\n",
       "0     1    Singapore     5703600         710                    8033\n",
       "1     2   Bangladesh   168870000      143998                    1173\n",
       "2     3      Lebanon     6855713       10452                     656\n",
       "3     4       Taiwan    23604265       36193                     652\n",
       "4     5  South Korea    51780579       99538                     520\n",
       "5     6       Rwanda    12374397       26338                     470\n",
       "6     7        Haiti    11577779       27065                     428\n",
       "7     8  Netherlands    17480000       41526                     421\n",
       "8     9       Israel     9220000       22072                     418\n",
       "9    10        India  1364080000     3287240                     415"
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tables[6]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Write to html Output\n",
    "\n",
    "If you are working on a website and want to quickly output the .html file, you can use to_html"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.to_html('simple.html',index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**read_html** is not perfect, but its quite powerful for such a simple method call!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Excel Files\n",
    "\n",
    "Pandas can read in basic excel files (it will get errors if there are macros or extensive formulas relying on outside excel files), in general, pandas can only grab the raw information from an .excel file.\n",
    "\n",
    "#### NOTE: Requires the openpyxl and xlrd library! Its provided for you in our environment, or simply install with:\n",
    "\n",
    "    pip install openpyxl\n",
    "    pip install xlrd\n",
    "    \n",
    "Heavy excel users may want to check out this website: https://www.python-excel.org/\n",
    "\n",
    "You can think of an excel file as a Workbook containin sheets, which for pandas means each sheet can be a DataFrame.\n",
    "\n",
    "## Excel file input with read_excel()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_excel('my_excel_file.xlsx',sheet_name='First_Sheet')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "      <th>d</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "      <td>10</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>12</td>\n",
       "      <td>13</td>\n",
       "      <td>14</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    a   b   c   d\n",
       "0   0   1   2   3\n",
       "1   4   5   6   7\n",
       "2   8   9  10  11\n",
       "3  12  13  14  15"
      ]
     },
     "execution_count": 75,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### What if you don't know the sheet name? Or want to run a for loop for certain sheet names? Or want every sheet?\n",
    "\n",
    "Several ways to do this: https://stackoverflow.com/questions/17977540/pandas-looking-up-the-list-of-sheets-in-an-excel-file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['First_Sheet']"
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Returns a list of sheet_names\n",
    "pd.ExcelFile('my_excel_file.xlsx').sheet_names"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Grab all sheets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [],
   "source": [
    "excel_sheets = pd.read_excel('my_excel_file.xlsx',sheet_name=None)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dict"
      ]
     },
     "execution_count": 78,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(excel_sheets)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dict_keys(['First_Sheet'])"
      ]
     },
     "execution_count": 79,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "excel_sheets.keys()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "      <th>d</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "      <td>10</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>12</td>\n",
       "      <td>13</td>\n",
       "      <td>14</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    a   b   c   d\n",
       "0   0   1   2   3\n",
       "1   4   5   6   7\n",
       "2   8   9  10  11\n",
       "3  12  13  14  15"
      ]
     },
     "execution_count": 80,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "excel_sheets['First_Sheet']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Write to Excel File"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.to_excel('example.xlsx',sheet_name='First_Sheet',index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# SQL Connections\n",
    "\n",
    "#### NOTE: Highly recommend you explore specific libraries for your specific SQL Engine. Simple search for your database+python in Google and the top results should hopefully include an API.\n",
    "\n",
    "* [MySQL](https://www.google.com/search?q=mysql+python)\n",
    "* [PostgreSQL](https://www.google.com/search?q=postgresql+python)\n",
    "* [MS SQL Server](https://www.google.com/search?q=MSSQLserver+python)\n",
    "* [Orcale](https://www.google.com/search?q=oracle+python)\n",
    "* [MongoDB](https://www.google.com/search?q=mongodb+python)\n",
    "\n",
    "Let's review pandas capabilities by using SQLite, which comes built in with Python.\n",
    "\n",
    "## Example SQL Database (temporary in your RAM)\n",
    "\n",
    "You will need to install sqlalchemy with:\n",
    "\n",
    "    pip install sqlalchemy\n",
    "    \n",
    "to follow along. To understand how to make a connection to your own database, make sure to review: https://docs.sqlalchemy.org/en/13/core/connections.html"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sqlalchemy import create_engine"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {},
   "outputs": [],
   "source": [
    "temp_db = create_engine('sqlite:///:memory:')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Write to Database"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Rank</th>\n",
       "      <th>Country</th>\n",
       "      <th>Population</th>\n",
       "      <th>Area (km2)</th>\n",
       "      <th>Density (Pop. per km2)</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>Singapore</td>\n",
       "      <td>5703600</td>\n",
       "      <td>710</td>\n",
       "      <td>8033</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>Bangladesh</td>\n",
       "      <td>168870000</td>\n",
       "      <td>143998</td>\n",
       "      <td>1173</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>Lebanon</td>\n",
       "      <td>6855713</td>\n",
       "      <td>10452</td>\n",
       "      <td>656</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>Taiwan</td>\n",
       "      <td>23604265</td>\n",
       "      <td>36193</td>\n",
       "      <td>652</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>South Korea</td>\n",
       "      <td>51780579</td>\n",
       "      <td>99538</td>\n",
       "      <td>520</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>6</td>\n",
       "      <td>Rwanda</td>\n",
       "      <td>12374397</td>\n",
       "      <td>26338</td>\n",
       "      <td>470</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>7</td>\n",
       "      <td>Haiti</td>\n",
       "      <td>11577779</td>\n",
       "      <td>27065</td>\n",
       "      <td>428</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>8</td>\n",
       "      <td>Netherlands</td>\n",
       "      <td>17480000</td>\n",
       "      <td>41526</td>\n",
       "      <td>421</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>9</td>\n",
       "      <td>Israel</td>\n",
       "      <td>9220000</td>\n",
       "      <td>22072</td>\n",
       "      <td>418</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>10</td>\n",
       "      <td>India</td>\n",
       "      <td>1364080000</td>\n",
       "      <td>3287240</td>\n",
       "      <td>415</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Rank      Country  Population  Area (km2)  Density (Pop. per km2)\n",
       "0     1    Singapore     5703600         710                    8033\n",
       "1     2   Bangladesh   168870000      143998                    1173\n",
       "2     3      Lebanon     6855713       10452                     656\n",
       "3     4       Taiwan    23604265       36193                     652\n",
       "4     5  South Korea    51780579       99538                     520\n",
       "5     6       Rwanda    12374397       26338                     470\n",
       "6     7        Haiti    11577779       27065                     428\n",
       "7     8  Netherlands    17480000       41526                     421\n",
       "8     9       Israel     9220000       22072                     418\n",
       "9    10        India  1364080000     3287240                     415"
      ]
     },
     "execution_count": 85,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tables[6]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {},
   "outputs": [],
   "source": [
    "pop = tables[6]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {},
   "outputs": [],
   "source": [
    "pop.to_sql(name='populations',con=temp_db)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Read from SQL Database"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>index</th>\n",
       "      <th>Rank</th>\n",
       "      <th>Country</th>\n",
       "      <th>Population</th>\n",
       "      <th>Area (km2)</th>\n",
       "      <th>Density (Pop. per km2)</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>Singapore</td>\n",
       "      <td>5703600</td>\n",
       "      <td>710</td>\n",
       "      <td>8033</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>Bangladesh</td>\n",
       "      <td>168870000</td>\n",
       "      <td>143998</td>\n",
       "      <td>1173</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>Lebanon</td>\n",
       "      <td>6855713</td>\n",
       "      <td>10452</td>\n",
       "      <td>656</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>Taiwan</td>\n",
       "      <td>23604265</td>\n",
       "      <td>36193</td>\n",
       "      <td>652</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>South Korea</td>\n",
       "      <td>51780579</td>\n",
       "      <td>99538</td>\n",
       "      <td>520</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "      <td>Rwanda</td>\n",
       "      <td>12374397</td>\n",
       "      <td>26338</td>\n",
       "      <td>470</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "      <td>Haiti</td>\n",
       "      <td>11577779</td>\n",
       "      <td>27065</td>\n",
       "      <td>428</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "      <td>Netherlands</td>\n",
       "      <td>17480000</td>\n",
       "      <td>41526</td>\n",
       "      <td>421</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "      <td>Israel</td>\n",
       "      <td>9220000</td>\n",
       "      <td>22072</td>\n",
       "      <td>418</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>9</td>\n",
       "      <td>10</td>\n",
       "      <td>India</td>\n",
       "      <td>1364080000</td>\n",
       "      <td>3287240</td>\n",
       "      <td>415</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   index  Rank      Country  Population  Area (km2)  Density (Pop. per km2)\n",
       "0      0     1    Singapore     5703600         710                    8033\n",
       "1      1     2   Bangladesh   168870000      143998                    1173\n",
       "2      2     3      Lebanon     6855713       10452                     656\n",
       "3      3     4       Taiwan    23604265       36193                     652\n",
       "4      4     5  South Korea    51780579       99538                     520\n",
       "5      5     6       Rwanda    12374397       26338                     470\n",
       "6      6     7        Haiti    11577779       27065                     428\n",
       "7      7     8  Netherlands    17480000       41526                     421\n",
       "8      8     9       Israel     9220000       22072                     418\n",
       "9      9    10        India  1364080000     3287240                     415"
      ]
     },
     "execution_count": 89,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Read in an entire table\n",
    "pd.read_sql(sql='populations',con=temp_db)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Singapore</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Bangladesh</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Lebanon</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Taiwan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>South Korea</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Rwanda</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Haiti</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Netherlands</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Israel</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>India</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       Country\n",
       "0    Singapore\n",
       "1   Bangladesh\n",
       "2      Lebanon\n",
       "3       Taiwan\n",
       "4  South Korea\n",
       "5       Rwanda\n",
       "6        Haiti\n",
       "7  Netherlands\n",
       "8       Israel\n",
       "9        India"
      ]
     },
     "execution_count": 92,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Read in with a SQL Query\n",
    "pd.read_sql_query(sql=\"SELECT Country FROM populations\",con=temp_db)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It is difficult to generalize pandas and SQL, due to a wide array of issues, including permissions,security, online access, varying SQL engines, etc... Use these ideas as a starting off point, and you will most likely need to do your own research for your own situation."
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}