You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

4705 lines
822 KiB

2 years ago
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"___\n",
"\n",
"<a href='http://www.pieriandata.com'><img src='../Pierian_Data_Logo.png'/></a>\n",
"___\n",
"<center><em>Copyright by Pierian Data Inc.</em></center>\n",
"<center><em>For more information, visit us at <a href='http://www.pieriandata.com'>www.pieriandata.com</a></em></center>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# CIA Country Analysis and Clustering\n",
"\n",
"\n",
"Source: All these data sets are made up of data from the US government. \n",
"https://www.cia.gov/library/publications/the-world-factbook/docs/faqs.html\n",
"\n",
"## Goal: \n",
"\n",
"### Gain insights into similarity between countries and regions of the world by experimenting with different cluster amounts. What do these clusters represent? *Note: There is no 100% right answer, make sure to watch the video for thoughts.*\n",
"\n",
"----\n",
"\n",
"## Imports and Data\n",
"\n",
"**TASK: Run the following cells to import libraries and read in data.**"
]
},
{
"cell_type": "code",
"execution_count": 701,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns"
]
},
{
"cell_type": "code",
"execution_count": 702,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv('../DATA/CIA_Country_Facts.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exploratory Data Analysis\n",
"\n",
"**TASK: Explore the rows and columns of the data as well as the data types of the columns.**"
]
},
{
"cell_type": "code",
"execution_count": 703,
"metadata": {},
"outputs": [],
"source": [
"# CODE HERE"
]
},
{
"cell_type": "code",
"execution_count": 704,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Country</th>\n",
" <th>Region</th>\n",
" <th>Population</th>\n",
" <th>Area (sq. mi.)</th>\n",
" <th>Pop. Density (per sq. mi.)</th>\n",
" <th>Coastline (coast/area ratio)</th>\n",
" <th>Net migration</th>\n",
" <th>Infant mortality (per 1000 births)</th>\n",
" <th>GDP ($ per capita)</th>\n",
" <th>Literacy (%)</th>\n",
" <th>Phones (per 1000)</th>\n",
" <th>Arable (%)</th>\n",
" <th>Crops (%)</th>\n",
" <th>Other (%)</th>\n",
" <th>Climate</th>\n",
" <th>Birthrate</th>\n",
" <th>Deathrate</th>\n",
" <th>Agriculture</th>\n",
" <th>Industry</th>\n",
" <th>Service</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Afghanistan</td>\n",
" <td>ASIA (EX. NEAR EAST)</td>\n",
" <td>31056997</td>\n",
" <td>647500</td>\n",
" <td>48.0</td>\n",
" <td>0.00</td>\n",
" <td>23.06</td>\n",
" <td>163.07</td>\n",
" <td>700.0</td>\n",
" <td>36.0</td>\n",
" <td>3.2</td>\n",
" <td>12.13</td>\n",
" <td>0.22</td>\n",
" <td>87.65</td>\n",
" <td>1.0</td>\n",
" <td>46.60</td>\n",
" <td>20.34</td>\n",
" <td>0.380</td>\n",
" <td>0.240</td>\n",
" <td>0.380</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Albania</td>\n",
" <td>EASTERN EUROPE</td>\n",
" <td>3581655</td>\n",
" <td>28748</td>\n",
" <td>124.6</td>\n",
" <td>1.26</td>\n",
" <td>-4.93</td>\n",
" <td>21.52</td>\n",
" <td>4500.0</td>\n",
" <td>86.5</td>\n",
" <td>71.2</td>\n",
" <td>21.09</td>\n",
" <td>4.42</td>\n",
" <td>74.49</td>\n",
" <td>3.0</td>\n",
" <td>15.11</td>\n",
" <td>5.22</td>\n",
" <td>0.232</td>\n",
" <td>0.188</td>\n",
" <td>0.579</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Algeria</td>\n",
" <td>NORTHERN AFRICA</td>\n",
" <td>32930091</td>\n",
" <td>2381740</td>\n",
" <td>13.8</td>\n",
" <td>0.04</td>\n",
" <td>-0.39</td>\n",
" <td>31.00</td>\n",
" <td>6000.0</td>\n",
" <td>70.0</td>\n",
" <td>78.1</td>\n",
" <td>3.22</td>\n",
" <td>0.25</td>\n",
" <td>96.53</td>\n",
" <td>1.0</td>\n",
" <td>17.14</td>\n",
" <td>4.61</td>\n",
" <td>0.101</td>\n",
" <td>0.600</td>\n",
" <td>0.298</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>American Samoa</td>\n",
" <td>OCEANIA</td>\n",
" <td>57794</td>\n",
" <td>199</td>\n",
" <td>290.4</td>\n",
" <td>58.29</td>\n",
" <td>-20.71</td>\n",
" <td>9.27</td>\n",
" <td>8000.0</td>\n",
" <td>97.0</td>\n",
" <td>259.5</td>\n",
" <td>10.00</td>\n",
" <td>15.00</td>\n",
" <td>75.00</td>\n",
" <td>2.0</td>\n",
" <td>22.46</td>\n",
" <td>3.27</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Andorra</td>\n",
" <td>WESTERN EUROPE</td>\n",
" <td>71201</td>\n",
" <td>468</td>\n",
" <td>152.1</td>\n",
" <td>0.00</td>\n",
" <td>6.60</td>\n",
" <td>4.05</td>\n",
" <td>19000.0</td>\n",
" <td>100.0</td>\n",
" <td>497.2</td>\n",
" <td>2.22</td>\n",
" <td>0.00</td>\n",
" <td>97.78</td>\n",
" <td>3.0</td>\n",
" <td>8.71</td>\n",
" <td>6.25</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Country Region Population \\\n",
"0 Afghanistan ASIA (EX. NEAR EAST) 31056997 \n",
"1 Albania EASTERN EUROPE 3581655 \n",
"2 Algeria NORTHERN AFRICA 32930091 \n",
"3 American Samoa OCEANIA 57794 \n",
"4 Andorra WESTERN EUROPE 71201 \n",
"\n",
" Area (sq. mi.) Pop. Density (per sq. mi.) Coastline (coast/area ratio) \\\n",
"0 647500 48.0 0.00 \n",
"1 28748 124.6 1.26 \n",
"2 2381740 13.8 0.04 \n",
"3 199 290.4 58.29 \n",
"4 468 152.1 0.00 \n",
"\n",
" Net migration Infant mortality (per 1000 births) GDP ($ per capita) \\\n",
"0 23.06 163.07 700.0 \n",
"1 -4.93 21.52 4500.0 \n",
"2 -0.39 31.00 6000.0 \n",
"3 -20.71 9.27 8000.0 \n",
"4 6.60 4.05 19000.0 \n",
"\n",
" Literacy (%) Phones (per 1000) Arable (%) Crops (%) Other (%) Climate \\\n",
"0 36.0 3.2 12.13 0.22 87.65 1.0 \n",
"1 86.5 71.2 21.09 4.42 74.49 3.0 \n",
"2 70.0 78.1 3.22 0.25 96.53 1.0 \n",
"3 97.0 259.5 10.00 15.00 75.00 2.0 \n",
"4 100.0 497.2 2.22 0.00 97.78 3.0 \n",
"\n",
" Birthrate Deathrate Agriculture Industry Service \n",
"0 46.60 20.34 0.380 0.240 0.380 \n",
"1 15.11 5.22 0.232 0.188 0.579 \n",
"2 17.14 4.61 0.101 0.600 0.298 \n",
"3 22.46 3.27 NaN NaN NaN \n",
"4 8.71 6.25 NaN NaN NaN "
]
},
"execution_count": 704,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 705,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 227 entries, 0 to 226\n",
"Data columns (total 20 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 Country 227 non-null object \n",
" 1 Region 227 non-null object \n",
" 2 Population 227 non-null int64 \n",
" 3 Area (sq. mi.) 227 non-null int64 \n",
" 4 Pop. Density (per sq. mi.) 227 non-null float64\n",
" 5 Coastline (coast/area ratio) 227 non-null float64\n",
" 6 Net migration 224 non-null float64\n",
" 7 Infant mortality (per 1000 births) 224 non-null float64\n",
" 8 GDP ($ per capita) 226 non-null float64\n",
" 9 Literacy (%) 209 non-null float64\n",
" 10 Phones (per 1000) 223 non-null float64\n",
" 11 Arable (%) 225 non-null float64\n",
" 12 Crops (%) 225 non-null float64\n",
" 13 Other (%) 225 non-null float64\n",
" 14 Climate 205 non-null float64\n",
" 15 Birthrate 224 non-null float64\n",
" 16 Deathrate 223 non-null float64\n",
" 17 Agriculture 212 non-null float64\n",
" 18 Industry 211 non-null float64\n",
" 19 Service 212 non-null float64\n",
"dtypes: float64(16), int64(2), object(2)\n",
"memory usage: 35.6+ KB\n"
]
}
],
"source": [
"df.info()"
]
},
{
"cell_type": "code",
"execution_count": 706,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>count</th>\n",
" <th>mean</th>\n",
" <th>std</th>\n",
" <th>min</th>\n",
" <th>25%</th>\n",
" <th>50%</th>\n",
" <th>75%</th>\n",
" <th>max</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Population</th>\n",
" <td>227.0</td>\n",
" <td>2.874028e+07</td>\n",
" <td>1.178913e+08</td>\n",
" <td>7026.000</td>\n",
" <td>437624.00000</td>\n",
" <td>4786994.000</td>\n",
" <td>1.749777e+07</td>\n",
" <td>1.313974e+09</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Area (sq. mi.)</th>\n",
" <td>227.0</td>\n",
" <td>5.982270e+05</td>\n",
" <td>1.790282e+06</td>\n",
" <td>2.000</td>\n",
" <td>4647.50000</td>\n",
" <td>86600.000</td>\n",
" <td>4.418110e+05</td>\n",
" <td>1.707520e+07</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Pop. Density (per sq. mi.)</th>\n",
" <td>227.0</td>\n",
" <td>3.790471e+02</td>\n",
" <td>1.660186e+03</td>\n",
" <td>0.000</td>\n",
" <td>29.15000</td>\n",
" <td>78.800</td>\n",
" <td>1.901500e+02</td>\n",
" <td>1.627150e+04</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Coastline (coast/area ratio)</th>\n",
" <td>227.0</td>\n",
" <td>2.116533e+01</td>\n",
" <td>7.228686e+01</td>\n",
" <td>0.000</td>\n",
" <td>0.10000</td>\n",
" <td>0.730</td>\n",
" <td>1.034500e+01</td>\n",
" <td>8.706600e+02</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Net migration</th>\n",
" <td>224.0</td>\n",
" <td>3.812500e-02</td>\n",
" <td>4.889269e+00</td>\n",
" <td>-20.990</td>\n",
" <td>-0.92750</td>\n",
" <td>0.000</td>\n",
" <td>9.975000e-01</td>\n",
" <td>2.306000e+01</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Infant mortality (per 1000 births)</th>\n",
" <td>224.0</td>\n",
" <td>3.550696e+01</td>\n",
" <td>3.538990e+01</td>\n",
" <td>2.290</td>\n",
" <td>8.15000</td>\n",
" <td>21.000</td>\n",
" <td>5.570500e+01</td>\n",
" <td>1.911900e+02</td>\n",
" </tr>\n",
" <tr>\n",
" <th>GDP ($ per capita)</th>\n",
" <td>226.0</td>\n",
" <td>9.689823e+03</td>\n",
" <td>1.004914e+04</td>\n",
" <td>500.000</td>\n",
" <td>1900.00000</td>\n",
" <td>5550.000</td>\n",
" <td>1.570000e+04</td>\n",
" <td>5.510000e+04</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Literacy (%)</th>\n",
" <td>209.0</td>\n",
" <td>8.283828e+01</td>\n",
" <td>1.972217e+01</td>\n",
" <td>17.600</td>\n",
" <td>70.60000</td>\n",
" <td>92.500</td>\n",
" <td>9.800000e+01</td>\n",
" <td>1.000000e+02</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Phones (per 1000)</th>\n",
" <td>223.0</td>\n",
" <td>2.360614e+02</td>\n",
" <td>2.279918e+02</td>\n",
" <td>0.200</td>\n",
" <td>37.80000</td>\n",
" <td>176.200</td>\n",
" <td>3.896500e+02</td>\n",
" <td>1.035600e+03</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Arable (%)</th>\n",
" <td>225.0</td>\n",
" <td>1.379711e+01</td>\n",
" <td>1.304040e+01</td>\n",
" <td>0.000</td>\n",
" <td>3.22000</td>\n",
" <td>10.420</td>\n",
" <td>2.000000e+01</td>\n",
" <td>6.211000e+01</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Crops (%)</th>\n",
" <td>225.0</td>\n",
" <td>4.564222e+00</td>\n",
" <td>8.361470e+00</td>\n",
" <td>0.000</td>\n",
" <td>0.19000</td>\n",
" <td>1.030</td>\n",
" <td>4.440000e+00</td>\n",
" <td>5.068000e+01</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Other (%)</th>\n",
" <td>225.0</td>\n",
" <td>8.163831e+01</td>\n",
" <td>1.614083e+01</td>\n",
" <td>33.330</td>\n",
" <td>71.65000</td>\n",
" <td>85.700</td>\n",
" <td>9.544000e+01</td>\n",
" <td>1.000000e+02</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Climate</th>\n",
" <td>205.0</td>\n",
" <td>2.139024e+00</td>\n",
" <td>6.993968e-01</td>\n",
" <td>1.000</td>\n",
" <td>2.00000</td>\n",
" <td>2.000</td>\n",
" <td>3.000000e+00</td>\n",
" <td>4.000000e+00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Birthrate</th>\n",
" <td>224.0</td>\n",
" <td>2.211473e+01</td>\n",
" <td>1.117672e+01</td>\n",
" <td>7.290</td>\n",
" <td>12.67250</td>\n",
" <td>18.790</td>\n",
" <td>2.982000e+01</td>\n",
" <td>5.073000e+01</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Deathrate</th>\n",
" <td>223.0</td>\n",
" <td>9.241345e+00</td>\n",
" <td>4.990026e+00</td>\n",
" <td>2.290</td>\n",
" <td>5.91000</td>\n",
" <td>7.840</td>\n",
" <td>1.060500e+01</td>\n",
" <td>2.974000e+01</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Agriculture</th>\n",
" <td>212.0</td>\n",
" <td>1.508443e-01</td>\n",
" <td>1.467980e-01</td>\n",
" <td>0.000</td>\n",
" <td>0.03775</td>\n",
" <td>0.099</td>\n",
" <td>2.210000e-01</td>\n",
" <td>7.690000e-01</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Industry</th>\n",
" <td>211.0</td>\n",
" <td>2.827109e-01</td>\n",
" <td>1.382722e-01</td>\n",
" <td>0.020</td>\n",
" <td>0.19300</td>\n",
" <td>0.272</td>\n",
" <td>3.410000e-01</td>\n",
" <td>9.060000e-01</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Service</th>\n",
" <td>212.0</td>\n",
" <td>5.652830e-01</td>\n",
" <td>1.658410e-01</td>\n",
" <td>0.062</td>\n",
" <td>0.42925</td>\n",
" <td>0.571</td>\n",
" <td>6.785000e-01</td>\n",
" <td>9.540000e-01</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" count mean std \\\n",
"Population 227.0 2.874028e+07 1.178913e+08 \n",
"Area (sq. mi.) 227.0 5.982270e+05 1.790282e+06 \n",
"Pop. Density (per sq. mi.) 227.0 3.790471e+02 1.660186e+03 \n",
"Coastline (coast/area ratio) 227.0 2.116533e+01 7.228686e+01 \n",
"Net migration 224.0 3.812500e-02 4.889269e+00 \n",
"Infant mortality (per 1000 births) 224.0 3.550696e+01 3.538990e+01 \n",
"GDP ($ per capita) 226.0 9.689823e+03 1.004914e+04 \n",
"Literacy (%) 209.0 8.283828e+01 1.972217e+01 \n",
"Phones (per 1000) 223.0 2.360614e+02 2.279918e+02 \n",
"Arable (%) 225.0 1.379711e+01 1.304040e+01 \n",
"Crops (%) 225.0 4.564222e+00 8.361470e+00 \n",
"Other (%) 225.0 8.163831e+01 1.614083e+01 \n",
"Climate 205.0 2.139024e+00 6.993968e-01 \n",
"Birthrate 224.0 2.211473e+01 1.117672e+01 \n",
"Deathrate 223.0 9.241345e+00 4.990026e+00 \n",
"Agriculture 212.0 1.508443e-01 1.467980e-01 \n",
"Industry 211.0 2.827109e-01 1.382722e-01 \n",
"Service 212.0 5.652830e-01 1.658410e-01 \n",
"\n",
" min 25% 50% \\\n",
"Population 7026.000 437624.00000 4786994.000 \n",
"Area (sq. mi.) 2.000 4647.50000 86600.000 \n",
"Pop. Density (per sq. mi.) 0.000 29.15000 78.800 \n",
"Coastline (coast/area ratio) 0.000 0.10000 0.730 \n",
"Net migration -20.990 -0.92750 0.000 \n",
"Infant mortality (per 1000 births) 2.290 8.15000 21.000 \n",
"GDP ($ per capita) 500.000 1900.00000 5550.000 \n",
"Literacy (%) 17.600 70.60000 92.500 \n",
"Phones (per 1000) 0.200 37.80000 176.200 \n",
"Arable (%) 0.000 3.22000 10.420 \n",
"Crops (%) 0.000 0.19000 1.030 \n",
"Other (%) 33.330 71.65000 85.700 \n",
"Climate 1.000 2.00000 2.000 \n",
"Birthrate 7.290 12.67250 18.790 \n",
"Deathrate 2.290 5.91000 7.840 \n",
"Agriculture 0.000 0.03775 0.099 \n",
"Industry 0.020 0.19300 0.272 \n",
"Service 0.062 0.42925 0.571 \n",
"\n",
" 75% max \n",
"Population 1.749777e+07 1.313974e+09 \n",
"Area (sq. mi.) 4.418110e+05 1.707520e+07 \n",
"Pop. Density (per sq. mi.) 1.901500e+02 1.627150e+04 \n",
"Coastline (coast/area ratio) 1.034500e+01 8.706600e+02 \n",
"Net migration 9.975000e-01 2.306000e+01 \n",
"Infant mortality (per 1000 births) 5.570500e+01 1.911900e+02 \n",
"GDP ($ per capita) 1.570000e+04 5.510000e+04 \n",
"Literacy (%) 9.800000e+01 1.000000e+02 \n",
"Phones (per 1000) 3.896500e+02 1.035600e+03 \n",
"Arable (%) 2.000000e+01 6.211000e+01 \n",
"Crops (%) 4.440000e+00 5.068000e+01 \n",
"Other (%) 9.544000e+01 1.000000e+02 \n",
"Climate 3.000000e+00 4.000000e+00 \n",
"Birthrate 2.982000e+01 5.073000e+01 \n",
"Deathrate 1.060500e+01 2.974000e+01 \n",
"Agriculture 2.210000e-01 7.690000e-01 \n",
"Industry 3.410000e-01 9.060000e-01 \n",
"Service 6.785000e-01 9.540000e-01 "
]
},
"execution_count": 706,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.describe().transpose()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exploratory Data Analysis\n",
"\n",
"Let's create some visualizations. Please feel free to expand on these with your own analysis and charts!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**TASK: Create a histogram of the Population column.**"
]
},
{
"cell_type": "code",
"execution_count": 707,
"metadata": {},
"outputs": [],
"source": [
"# CODE HERE"
]
},
{
"cell_type": "code",
"execution_count": 708,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:xlabel='Population', ylabel='Count'>"
]
},
"execution_count": 708,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEGCAYAAACKB4k+AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAVj0lEQVR4nO3de5CldX3n8fdHxpGrGZRZCoE4uBI3hLDRjAQvm6CkdpHNComEwlIEM4HyEqLrJhGkdjWbsqIVczNRsrPAgpaLXBbDRI2JywzBNQI7XOSm4KyIDiK0mT6DIgQGvvvHeeaZtumePtPMOU93n/erqquf81zO8+mu7v70czm/k6pCkiSAZ3UdQJK0cFgKkqSWpSBJalkKkqSWpSBJai3rOsAzccABB9SqVau6jiFJi8pNN930/apaOdOyRV0Kq1atYuPGjV3HkKRFJcl9sy3z9JEkqWUpSJJaloIkqWUpSJJaloIkqWUpSJJaloIkqTW0UkhyUZKHktwxZd4fJfl6ktuSfCbJiinLzk2yKcndSf7dsHJJkmY3zCOFi4Hjp837InBkVR0F3AOcC5DkCOBU4GeabT6eZI9hBasqJicnmZycxPeTkKQdhlYKVXUdsGXavL+vqm3Nw+uBQ5rpE4FPV9U/V9W9wCbg6GFl6/V6nHb+ek47fz29Xm9Yu5GkRafLawq/AfxtM30w8J0pyzY3854myVlJNibZODExMe+dL99rX5bvte+8t5ekpaiTUkhyHrAN+NSubltVa6tqdVWtXrlyxvGcJEnzNPIB8ZKcAfwKcFztOKF/P3DolNUOaeZJkkZopEcKSY4Hfg94fVX9aMqidcCpSZ6T5DDgcODGUWaTJA3xSCHJpcCxwAFJNgPvp3+30XOALyYBuL6q3lZVdya5HLiL/mmld1bVk8PKJkma2dBKoareOMPsC3ey/geBDw4rjyRpbr6iWZLUshQkSS1LQZLUshQkSS1LQZLUshQkSS1LQZLUshQkSS1LQZLUshQkSS1LQZLUshQkSS1LQZLUshQkSS1LQZLUshQkSS1LQZLUshQkSS1LQZLUshQkSS1LQZLUshQkSS1LQZLUshQkSS1LQZLUGlopJLkoyUNJ7pgy73lJvpjkG83n/Zv5SfLRJJuS3JbkZcPKJUma3TCPFC4Gjp827xzgmqo6HLimeQzwOuDw5uMs4Pwh5pIkzWJopVBV1wFbps0+Ebikmb4EOGnK/E9U3/XAiiQHDSubJGlmo76mcGBVPdBMfw84sJk+GPjOlPU2N/MkSSPU2YXmqiqgdnW7JGcl2Zhk48TExBCSSdL4GnUpPLj9tFDz+aFm/v3AoVPWO6SZ9zRVtbaqVlfV6pUrVw41rCSNm1GXwjrg9Gb6dODqKfPf0tyFdAywdcppJknSiCwb1hMnuRQ4FjggyWbg/cCHgMuTrAHuA05pVv88cAKwCfgR8NZh5ZIkzW5opVBVb5xl0XEzrFvAO4eVRZI0GF/RLElqWQqSpJalIElqWQqSpJalIElqWQqSpJalIElqWQqSpJalIElqWQqSpJalIElqWQqSpJalIElqWQqSpJalIElqWQqSpJalIElqWQqSpJalIElqWQqSpJalIElqWQqSpJalIElqWQqSpJalIElqWQqSpFYnpZDkPya5M8kdSS5NsmeSw5LckGRTksuSLO8imySNs5GXQpKDgd8GVlfVkcAewKnAh4E/raoXA5PAmlFnk6Rx19Xpo2XAXkmWAXsDDwCvBa5sll8CnNRNNEkaXyMvhaq6H/gI8G36ZbAVuAnoVdW2ZrXNwMEzbZ/krCQbk2ycmJgYRWRJGhtdnD7aHzgROAx4AbAPcPyg21fV2qpaXVWrV65cOaSUkjSeujh99MvAvVU1UVVPAFcBrwJWNKeTAA4B7u8gmySNtS5K4dvAMUn2ThLgOOAuYANwcrPO6cDVHWSTpLHWxTWFG+hfUL4ZuL3JsBZ4L/CeJJuA5wMXjjqbJI27ZXOvsvtV1fuB90+b/U3g6A7iSJIavqJZktSyFCRJLUtBktSyFCRJLUtBktSyFCRJLUtBktSyFCRJLUtBktSyFCRJrYFKIcmrBpknSVrcBj1S+IsB50mSFrGdDoiX5BXAK4GVSd4zZdFz6b+3siRpCZlrlNTlwL7NevtNmf8wO977QJK0ROy0FKrqH4B/SHJxVd03okySpI4M+n4Kz0myFlg1dZuqeu0wQkmSujFoKVwB/BVwAfDk8OJIkro0aClsq6rzh5pEktS5QW9J/Zsk70hyUJLnbf8YajJJ0sgNeqRwevP5d6fMK+BFuzeOJKlLA5VCVR027CCSpO4NVApJ3jLT/Kr6xO6NI0nq0qCnj14+ZXpP4DjgZsBSkKQlZNDTR2dPfZxkBfDpYQSSJHVnvkNnPwJ4nUGSlphBryn8Df27jaA/EN5PA5fPd6fNkcYFwJHN8/4GcDdwGf1XTX8LOKWqJue7D0nSrhv0msJHpkxvA+6rqs3PYL9/Dnyhqk5OshzYG3gfcE1VfSjJOcA5wHufwT4kSbtooNNHzcB4X6c/Uur+wOPz3WGSnwB+Ebiwee7Hq6oHnAhc0qx2CXDSfPchSZqfQd957RTgRuDXgVOAG5LMd+jsw4AJ4H8kuSXJBUn2AQ6sqgeadb4HHDhLlrOSbEyycWJiYp4RJEkzGfRC83nAy6vq9Kp6C3A08J/nuc9lwMuA86vqpfQvWp8zdYWqKnZcw2DasrVVtbqqVq9cuXKeESRJMxm0FJ5VVQ9NefxPu7DtdJuBzVV1Q/P4Svol8WCSgwCazw/Nsr0kaUgG/cP+hSR/l+SMJGcAnwM+P58dVtX3gO8keUkz6zjgLmAdO8ZYOh24ej7PL0mav7neo/nF9M/1/26SXwNe3Sz6CvCpZ7Dfs4FPNXcefRN4K/2CujzJGuA++tcuJEkjNNctqX8GnAtQVVcBVwEk+dlm2X+Yz06r6lZg9QyLjpvP80mSdo+5Th8dWFW3T5/ZzFs1lESSpM7MVQordrJsr92YQ5K0AMxVChuTnDl9ZpLfBG4aTiRJUlfmuqbwbuAzSd7EjhJYDSwHfnWIuSRJHdhpKVTVg8Ark7yG/uB1AJ+rqvVDTyZJGrlB309hA7BhyFkkSR2b76uSJUlLkKUgSWpZCpKklqUgSWpZCpKklqUgSWpZCpKklqUgSWpZCpKklqUgSWpZCpKklqUgSWpZCpKklqUgSWpZCpKklqUgSWpZCpKklqUgSWpZCpKkVmelkGSPJLck+Wzz+LAkNyTZlOSyJMu7yiZJ46rLI4V3AV+b8vjDwJ9W1YuBSWBNJ6kkaYx1UgpJDgH+PXBB8zjAa4Erm1UuAU7qIpskjbOujhT+DPg94Knm8fOBXlVtax5vBg6eacMkZyXZmGTjxMTE0INK0jgZeSkk+RXgoaq6aT7bV9XaqlpdVatXrly5m9NJ0nhb1sE+XwW8PskJwJ7Ac4E/B1YkWdYcLRwC3N9BNkkaayM/Uqiqc6vqkKpaBZwKrK+qNwEbgJOb1U4Hrh51NkkadwvpdQrvBd6TZBP9awwXdpxHksZOF6ePWlV1LXBtM/1N4Ogu80jSuFtIRwqSpI5ZCpKklqUgSWpZCpKklqUgSWqNdSlUFb1ej6rqOookLQhjXQpPPPpDzly7nl6v13UUSVoQxroUAJbttW/XESRpwRj7UpAk7WApSJJaloIkqWUpSJJaloIkqWUpSJJaloIkqWUpSJJaloIkqWUpSJJaloIkqWUpSJJaloIkqWUpSJJaloIkqWUpSJJaloIkqTXyUkhyaJINSe5KcmeSdzXzn5fki0m+0Xzef9TZJGncdXGksA34T1V1BHAM8M4kRwDnANdU1eHANc1jSdIIjbwUquqBqrq5mf4B8DXgYOBE4JJmtUuAk0adTZLGXafXFJKsAl4K3AAcWFUPNIu+BxzYVS5JGledlUKSfYH/Bby7qh6euqyqCqhZtjsrycYkGycmJkaQVJLGRyelkOTZ9AvhU1V1VTP7wSQHNcsPAh6aaduqWltVq6tq9cqVK0cTWJLGRBd3HwW4EPhaVf3JlEXrgNOb6dOBq0edTZL
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.histplot(data=df,x='Population')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**TASK: You should notice the histogram is skewed due to a few large countries, reset the X axis to only show countries with less than 0.5 billion people**"
]
},
{
"cell_type": "code",
"execution_count": 709,
"metadata": {},
"outputs": [],
"source": [
"#CODE HERE"
]
},
{
"cell_type": "code",
"execution_count": 710,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:xlabel='Population', ylabel='Count'>"
]
},
"execution_count": 710,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEGCAYAAACKB4k+AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAUoklEQVR4nO3dfZBldX3n8fdHHsSHCMp0kXGAHVwpDbKo2CEoSRYlW0vYBMiGJVguDgadykpQ1mgE3Qpk9w9NhUqihujOIgEpCkGCYTRoQhCjsWR0BmR4jrMQZEaU9oHRqCs77Hf/uGcOl+b29O2evvd0T79fVV1z7u88fc/c7vu553fu+d1UFZIkATyj6wIkSYuHoSBJahkKkqSWoSBJahkKkqTW3l0XsDtWrFhRq1ev7roMSVpSNm3a9J2qmhg0b0mHwurVq9m4cWPXZUjSkpLkoZnmjaz7KMllSR5Ncldf2x8nuS/J5iSfTHJA37wLkmxJcn+Sfz+quiRJMxvlNYXLgROntd0EHFlVRwH/BFwAkOQI4AzgZc06f5FkrxHWJkkaYGShUFVfAL43re3vqmpH8/BW4OBm+hTg41X106p6ENgCHDOq2iRJg3X56aPfBj7TTK8CHu6bt7Vpe5oka5NsTLJxampqxCVK0vLSSSgkeS+wA7hqrutW1bqqmqyqyYmJgRfPJUnzNPZPHyU5C/g14IR6cjS+bcAhfYsd3LRJksZorGcKSU4Efh84uap+3DdrPXBGkmcmOQw4HPjKOGuTJI3wTCHJ1cDxwIokW4EL6X3a6JnATUkAbq2q36mqu5NcC9xDr1vpnKp6YlS1SZIGy1L+PoXJycny5jVJmpskm6pqctC8JX1H8+44550XsG1q+9PaV03szyUXv6+DiiSpe8s2FLZNbWe/4858evuXruygGklaHBwlVZLUMhQkSS1DQZLUMhQkSS1DQZLUMhQkSS1DQZLUMhQkSS1DQZLUMhQkSS1DQZLUMhQkSS1DQZLUMhQkSS1DQZLUMhQkSS1DQZLUMhQkSS1DQZLUMhQkSS1DQZLUMhQkSS1DQZLUMhQkSS1DQZLUGlkoJLksyaNJ7upre0GSm5J8vfn3+U17knwwyZYkm5McPaq6JEkzG+WZwuXAidPazgdurqrDgZubxwC/Chze/KwFPjzCuiRJMxhZKFTVF4DvTWs+Bbiimb4COLWv/WPVcytwQJKVo6pNkjTYuK8pHFRVjzTT3wIOaqZXAQ/3Lbe1aXuaJGuTbEyycWpqanSVStIy1NmF5qoqoOax3rqqmqyqyYmJiRFUJknL17hD4ds7u4Wafx9t2rcBh/Qtd3DTJkkao3GHwnpgTTO9Brihr/2NzaeQjgW293UzSZLGZO9RbTjJ1cDxwIokW4ELgfcD1yY5G3gIOL1Z/EbgJGAL8GPgTaOqS5I0s5GFQlW9foZZJwxYtoBzRlWLJGk43tEsSWoZCpKklqEgSWoZCpKklqEgSWoZCpKklqEgSWoZCpKklqEgSWoZCpKklqEgSWoZCpKklqEgSWoZCpKklqEgSWoZCpKklqEgSWoZCpKklqEgSWoZCpKklqEgSWoZCpKklqEgSWoZCpKklqEgSWoZCpKkViehkOS/Jrk7yV1Jrk6yX5LDkmxIsiXJNUn27aI2SVrOxh4KSVYBbwMmq+pIYC/gDOCPgD+tqhcD3wfOHndtkrTcddV9tDfwrCR7A88GHgFeB1zXzL8COLWb0iRp+Rp7KFTVNuBi4Bv0wmA7sAl4rKp2NIttBVYNWj/J2iQbk2ycmpoaR8mStGx00X30fOAU4DDghcBzgBOHXb+q1lXVZFVNTkxMjKhKSVqeuug++hXgwaqaqqr/C1wPHAcc0HQnARwMbOugNkla1roIhW8AxyZ5dpIAJwD3ALcApzXLrAFu6KA2SVrWurimsIHeBeXbgDubGtYB7wbekWQLcCDw0XHXJknL3d6zL7LwqupC4MJpzQ8Ax3RQjiSp4R3NkqSWoSBJahkKkqSWoSBJahkKkqSWoSBJahkKkqSWoSBJahkKkqSWoSBJahkKkqSWoSBJahkKkqSWoSBJahkKkqTWUKGQ5Lhh2iRJS9uwZwofGrJNkrSE7fKb15K8GngNMJHkHX2zngfsNcrCJEnjN9vXce4LPLdZ7mf62n8AnDaqoiRJ3dhlKFTVPwD/kOTyqnpoTDVJkjoy25nCTs9Msg5Y3b9OVb1uFEVJkroxbCh8AvgIcCnwxOjKkSR1adhQ2FFVHx5pJZKkzg37kdRPJXlrkpVJXrDzZ6SVSZLGbtgzhTXNv+/qayvgRQtbjiSpS0OFQlUdNupCJEndGyoUkrxxUHtVfWw+O01yAL2L1kfSO+P4beB+4Bp6n3D6Z+D0qvr+fLYvSZqfYa8p/Hzfzy8BFwEn78Z+PwB8tqpeCrwcuBc4H7i5qg4Hbm4eS5LGaNjuo3P7Hzfv9D8+nx0m2R/4ZeCsZtuPA48nOQU4vlnsCuDzwLvnsw9J0vzMd+jsHwHzvc5wGDAF/GWS25NcmuQ5wEFV9UizzLeAg+a5fUnSPA17TeFT9Pr+oTcQ3s8B1+7GPo8Gzq2qDUk+wLSuoqqqJDVo5SRrgbUAhx566DxLkCQNMuxHUi/um94BPFRVW+e5z63A1qra0Dy+jl4ofDvJyqp6JMlK4NFBK1fVOmAdwOTk5MDgkCTNz1DdR83AePfRGyn1+cDj891hVX0LeDjJS5qmE4B7gPU8eT/EGuCG+e5DkjQ/w3YfnQ78Mb2LvwE+lORdVXXdPPd7LnBVkn2BB4A30Quoa5OcDTwEnD7PbUuS5mnY7qP3Aj9fVY8CJJkA/p5e18+cVdXXgMkBs06Yz/YkSQtj2E8fPWNnIDS+O4d1JUlLxLBnCp9N8rfA1c3j3wJuHE1JkqSuzPYdzS+md//Au5L8R+AXm1lfBq4adXGSpPGa7Uzhz4ALAKrqeuB6gCT/ppn36yOsTZI0ZrNdFzioqu6c3ti0rR5JRZKkzswWCgfsYt6zFrAOSdIiMFsobEzylumNSd4MbBpNSZKkrsx2TeE84JNJ3sCTITAJ7Av8xgjrkiR1YJehUFXfBl6T5LX0vhAH4G+q6nMjr0ySNHbDfp/CLcAtI65FktQx70qWJLUMBUlSy1CQJLUMBUlSy1CQJLUMBUlSy1CQJLUMBUlSy1CQJLUMBUlSy1CQJLUMBUlSy1CQJLUMBUlSy1CQJLUMBUlSy1CQJLU6C4UkeyW5Pcmnm8eHJdmQZEuSa5Ls21VtkrRcdXmm8Hbg3r7HfwT8aVW9GPg+cHYnVUnSMtZJKCQ5GPgPwKXN4wCvA65rFrkCOLWL2iRpOevqTOHPgN8H/l/z+EDgsara0TzeCqwatGKStUk2Jtk4NTU18kIlaTkZeygk+TXg0araNJ/1q2pdVU1W1eTExMQCVydJy9veHezzOODkJCcB+wHPAz4AHJBk7+Zs4WBgWwe1SdKyNvYzhaq6oKoOrqrVwBnA56rqDcAtwGnNYmuAG8ZdmyQtd4vpPoV3A+9IsoXeNYaPdlyPJC07XXQftarq88Dnm+kHgGO6rEeSlrvFdKYgSeqYoSBJahkKkqRWp9cUFqM779zMqWve+pS2VRP7c8nF7+uoIkkaH0NhmsdrL/Y77syntG370pUdVSNJ42X3kSSpZShIklqGgiSpZShIklqGgiSpZShIklqGgiSpZShIklqGgiSpZShIklqGgiSpZShIklqGgiSpZShIklqGgiSpZShIklqGgiSpZShIklqGgiSpZShIklqGgiSpNfZQSHJIkluS3JPk7iRvb9pfkOSmJF9v/n3+uGuTpOWuizOFHcDvVdURwLHAOUmOAM4Hbq6qw4Gbm8eSpDEaeyhU1SNVdVsz/UPgXmAVcApwRbPYFcCp465Nkpa7Tq8pJFkNvBLYABxUVY80s74FHDTDOmuTbEyycWpqajyFStIy0VkoJHku8FfAeVX1g/55VVVADVqvqtZV1WRVTU5MTIyhUklaPjoJhST70AuEq6rq+qb520lWNvNXAo92UZskLWddfPoowEeBe6vqT/pmrQfWNNNrgBvGXZskLXd
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.histplot(data=df[df['Population']<500000000],x='Population')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**TASK: Now let's explore GDP and Regions. Create a bar chart showing the mean GDP per Capita per region (recall the black bar represents std).**"
]
},
{
"cell_type": "code",
"execution_count": 711,
"metadata": {},
"outputs": [],
"source": [
"# CODE HERE"
]
},
{
"cell_type": "code",
"execution_count": 712,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABskAAAW7CAYAAAB1oV8rAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAB7CAAAewgFu0HU+AAEAAElEQVR4nOzdebxuZVk//s8FR0ZnQREwTRCntCzEecp+WkKFhnMqjpmUQ0o2mWaaqdlXS/w6a2mp+FVQEdMswbFAU7Sc4pQKToCiIjIIXL8/nrU9j9tnT5y9n7Nhvd+v13qte611r2tdzz5/fs69VnV3AAAAAAAAYEx22tENAAAAAAAAwLwJyQAAAAAAABgdIRkAAAAAAACjIyQDAAAAAABgdIRkAAAAAAAAjI6QDAAAAAAAgNERkgEAAAAAADA6QjIAAAAAAABGR0gGAAAAAADA6AjJAAAAAAAAGB0hGQAAAAAAAKMjJAMAAAAAAGB0hGQAAAAAAACMjpAMAAAAAACA0RGSAQAAAAAAMDpCMgAAAAAAAEZHSAYAAAAAAMDobNnRDcBiVbVrklsNh2cnuXQHtgMAAAAAAOxYOyfZexh/prsvWo+iQjI2o1slOXVHNwEAAAAAAGw6t03y8fUo5HWLAAAAAAAAjI6VZGxGZy8MTjnllFz/+tffkb0AAAAAAAA70Ne//vUccsghC4dnLzd3LYRkbEY/+gbZ9a9//ey///47shcAAAAAAGDzuHTlKavjdYsAAAAAAACMjpAMAAAAAACA0RGSAQAAAAAAMDpCMgAAAAAAAEZHSAYAAAAAAMDoCMkAAAAAAAAYHSEZAAAAAAAAoyMkAwAAAAAAYHSEZAAAAAAAAIyOkAwAAAAAAIDREZIBAAAAAAAwOkIyAAAAAAAARkdIBgAAAAAAwOgIyQAAAAAAABgdIRkAAAAAAACjIyQDAAAAAABgdIRkAAAAAAAAjI6QDAAAAAAAgNERkgEAAAAAADA6QjIAAAAAAABGR0gGAAAAAADA6AjJAAAAAAAAGB0hGQAAAAAAAKMjJAMAAAAAAGB0hGQAAAAAAACMjpAMAAAAAACA0RGSAQAAAAAAMDpCMgAAAAAAAEZHSAYAAAAAAMDoCMkAAAAAAAAYHSEZAAAAAAAAoyMkAwAAAAAAYHSEZAAAAAAAAIyOkAwAAAAAAIDREZIBAAAAAAAwOkIyAAAAAAAARmfLjm4AAAAAAACuTI455phs3bp15rUDDjggRx111Jw7AmYRkgEAAAAAwDraunVrTjvttB3dBrACr1sEAAAAAABgdIRkAAAAAAAAjI6QDAAAAAAAgNERkgEAAAAAADA6owzJqurqVfWgqnpRVZ1cVadX1Xer6uKqOquqTqqq36+q6yxT4+5V1avcnrWKnrZU1eOr6kNVdXZVXVBVW6vqFVV1yzX8tr2q6tlV9emq+t6wfXo4t+TvmVHnZ4Znbx16OXvo7fFVtWW1dQAAAAAAADajsYYdhyR50xLX9k5yt2E7uqp+s7vfu5HNVNVeSU5McttFl26c5HFJHlFVv9Pdr16hzu2SHJ9kn0WXbjVsj6mqw7v7lBXqPDbJS5PsMnV6tyR3HrZHVtWh3X3Osj8MAAAAAABgkxprSJYkZyT5QJJPDOOvZ7Kybv8kRyS5X5K9kryzqg7p7tOWqfWoJKcuc/2spS5U1c5Jjsu2gOztSV6V5NtJbpfkT5JcN8krquqr3f2eJercIMm7Mgn5Lkny10lOGC4fluT3klw/ybuq6he6+8wl6twnycsz+Vt8M8lzk/x7kmsneWwmf5dDkhxXVXfv7kuX+d0AAAAAAACb0lhDsg90908tc/3Yqjo8k/BqlyTPzCQcWsr/dvd/Xs5eHpHJ6qwkeVl3HzV17ZSqek8mQd7Vk/xNVd28uy+ZUee5mQRkSfKQ7n7r1LUPVdUnkrwlk8DtOUmOXFygqq6S5G8zCci+l+RO3b11aso/VdUxSZ4w9PywJK9fw28FAAAAAADYFEb5TbLVrH7q7uOTfGE4vMsGtvO0Yf/tJEfP6OP0JM8bDg9Mct/Fc6pqnyQPHQ7fuyggW6hzbJKF10Y+bLhnsftm8orHJHneooBswdFJzp0aAwAAAAAAXOGMMiRbg/OG/W4bUbyqDkpy8+Hw2O7+wRJTXz81/omQLMmvZdu/5euWeeRCnZ2GexY7fIln/sjQ47HD4S2G3wAAAAAAAHCFIiRbQlXdNMnPDYef36DH3HlqfPJSk7r7G0m+OBze6fLWWXRtuTpfGJ55eesAAAAAAABsakKyKVW1R1XdpKp+L5MgaOGbbS9e4dbnVtWXq+qiqjq3qj5ZVf9nFausbjE1XimIW7h+g6rac4k6310u3Orur2fyrbFk2wq2JElVXTXJDdbYy0/UAQAAAAAAuCLYsvKUK7eqOjLLv6LwL5P84wpl7jg13iWTFWg/l+SJVfXnSf6su3vGfftPjc9c4RlnDPsa7vvC1LWFOivVWKhzy2wLxLanl8yos6Kq2n+FKbO+lwYAAAAAALBuRh+SLeNTSR7X3acuM+frSd6e5MNJ/ifJJUl+KslhSR6e5CpJnplJcPZHM+6/2tT4+yv0c/7U+KpL1FmpxnSdpWpsby+rccbKUwAAAAAAADaOkCw5PsnHh/HuSQ5I8oAk903ypqp6cnefMOO+U5PcsLt/uOj8fyQ5vqpemeR9Sa6R5A+q6i3dfdqiubtNjS9eoc+Lpsa7L1FnpRrTdZaqsb29AAAAAAAAbHqj/yZZd3+nu/9z2E7t7jd39/0yWQl24yTvGF7JuPi+82cEZNPXT0nyO8NhTY2nXTg13mWFVnedGl+wRJ2VakzXWarG9vayGjdYYbvt5agJAAAAAACwaqMPyZbS3W9I8tZM/kYvraprX44yb07yvWF8txnXz5sar/Tawj2nxotfh7hQZzWvPlyos1SN7e1lRd195nJbkm+stSYAAAAAAMBaCMmW945hv2eSX17rzd19SZIvDof7zZhy5tR4/xXK3WCh7KL7puusVGO6zuLvgn31cvQyqw4AAAAAAMCmJyRb3tlT4xtezhq9zLXPTo1vtkKdhetndPf5S9S5RlXts1SBqrp+kqsPh5/7sSa7z8u2wGu1vfxEHQAAAAAAgCsCIdnypld/rfm1glW1JclBw+HXZkz58NR41usYF+rsM1XnI5e3zqJry9W56XJh2yrqAAAAAAAAbGpCsuXdf2r8mctx/wOTXGMYn7z4Ynd/MdtWYj2gqvZYos6RU+PjZlx/Z5LLhvEjl+lnoc5lwz2LHb/EM39k6PEBw+Fnh98AAAAAAABwhTLKkKyqjqyq3VaY85Qk9xkO/zfJh6auXauq7r7C/Yckeelw2En+7xJT/2rYXzvJC2bUOSDJHw6Hp2dGSNbd30jyD8PhvavqiBl17p/k3sPhG4Z7Fjsuyf8M4z8cnr3YC5Nca2oMAAAAAABwhbNlRzewgzwryYuq6m2ZvGJwayavU7xaklsleWiSOw1zL07yuO6+dOr+ayT5QFV9OpPVV59I8vUklyb5qSSHJXlYkl2G+X/V3Z9Yope/S/Ko4XlHDa85fFWSc5MckuQZmXxH7LIkT+zuS5ao88dJfjnJ3kneVFUHJzlhuHZYkqcO47OT/MmsAt39w6r63STvGp75kap6TpJTMgnGHpvkN4bpH07yhiV6AQAAAAAA2NTGGpIlk5Vbjx22pZyZ5FHd/f4lrt962JZyaZI/T/LspSZ096VVdXiSE5PcNpMQ6jcWTbsoye9093uWqXNGVf1qJqHdPkmePmzTvpHk8O4+c5k6J1bV4zNZBXe9JH87Y9opSe67KDgEAAAAAAC4whhrSHbvJIdmsnrrwEzCoOskuSDJWUk+lckqrGO7+wcz7v9aJt8ru0Mmq732S7JXkt2SfDfJF5KclOTV3f2llZrp7nOq6o6ZBHYPSXLzJHsOz/mXJC/p7v9aRZ1/r6pbJXlSksOT3Gi49L9J3pHkxd39rVXUeVVVfSzJE5PcM8m+Sc7P5Ptp/zD8rqVWtAEAAAAAAGx6owzJuvsLmQRZf3057784yf8btvXq6ZJMvlu21LfLVlv
"text/plain": [
"<Figure size 2000x1200 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(10,6),dpi=200)\n",
"sns.barplot(data=df,y='GDP ($ per capita)',x='Region',estimator=np.mean)\n",
"plt.xticks(rotation=90);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**TASK: Create a scatterplot showing the relationship between Phones per 1000 people and the GDP per Capita. Color these points by Region.**"
]
},
{
"cell_type": "code",
"execution_count": 713,
"metadata": {},
"outputs": [],
"source": [
"#CODE HERE"
]
},
{
"cell_type": "code",
"execution_count": 714,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.legend.Legend at 0x194e5896160>"
]
},
"execution_count": 714,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAACTcAAAQYCAYAAADRI38zAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAB7CAAAewgFu0HU+AAEAAElEQVR4nOzdd3hU1dbH8XVmJslk0ishIdIJEEFpUbqACoiggHS5AioWVMBeAGmC7Sq2i4VeBEOxACIohCq9GyQUKYH03svMnPcPmLwRU2bChCTy/TzPPJ7J2XufNY2by/xYW1FVVQAAAAAAAAAAAAAAAACgutFUdQEAAAAAAAAAAAAAAAAAUBLCTQAAAAAAAAAAAAAAAACqJcJNAAAAAAAAAAAAAAAAAKolwk0AAAAAAAAAAAAAAAAAqiXCTQAAAAAAAAAAAAAAAACqJcJNAAAAAAAAAAAAAAAAAKolwk0AAAAAAAAAAAAAAAAAqiXCTQAAAAAAAAAAAAAAAACqJcJNAAAAAAAAAAAAAAAAAKolwk0AAAAAAAAAAAAAAAAAqiXCTQAAAAAAAAAAAAAAAACqJcJNAAAAAAAAAAAAAAAAAKolwk0AAAAAAAAAAAAAAAAAqiXCTQAAAAAAAAAAAAAAAACqJcJNAAAAAAAAAAAAAAAAAKolwk0AAAAAAAAAAAAAAAAAqiXCTQAAAAAAAAAAAAAAAACqJV1VFwAAAAAAAAAAAACgZIcOHWokIgNEpJWIuIuIUrUVAQAA/I0qIhkickRE1rZp0+asvS+gqKpq7zUBAAAAAAAAAAAA3IBDhw4pIjJVRPqIiFar1bopikLjAgAAUO2oqmo0mUyZImISkfUiMq1NmzZ2CyTxCxAAAAAAAAAAAABQ/TwuIn0cHBwCtFqtp4goWq3WqCgKnQsAAEC1oaqqYjKZdA4ODgEmkymtsLDwQRGJFpH59roGnZsAAAAAAAAAAACAauTQoUPBIvK9Tqfz1el0fn5+fgne3t6pOp3OXNW1AQAAXM9oNGpSUlK8EhMT/Y1GY6LRaEwSkf5t2rSJtsf6GnssAgAAAAAAAAAAAMBuOoiIotPpfLy9vZP9/f2TCTYBAIDqSqfTmf39/ZO9vb2TdTqdj4goItLeXusTbgIAAAAAAAAAAACql7s1Go2ziGg8PDzSq7oYAAAAa1z7vUVz7fcYwk0AAAAAAAAAAADAv1SwRqPRazQas7Ozc35VFwMAAGANZ2fnfI1GY9ZoNHoRqWOvdQk3AQAAAAAAAAAAANWLQa52PTArilLVtQAAAFhFURRRFMUsV/NILvZal3ATAAAAAAAAAAAAUD2pVV0AAACALRRFsfvvL4SbAAAAAAAAAAAAAAAAAFRLhJsAAAAAAAAAAAAAAAAAVEuEmwAAAAAAAAAAAAAAAABUS4SbAAAAAAAAAAAAAAAAAFRLhJsAAAAAAAAAAAAAAAAAVEuEmwAAAAAAAAAAAAAAVjl27JiTg4NDaycnp9bnz593qOp6gH+jc+fOOTg6OrZ2cHBoffz4caeqrgeoarqqLgD/HoqiOIlIi2t3E0XEVIXlAAAAAAAAAACAqqUVEb9rxydUVc2vymIAW2zYsMH1wQcfDLHc37x586n77rsv29r5GRkZmi+//NJnw4YNHn/++achPT1dp6qquLi4mIKCggqaNWuWe/fdd2c99NBD6Y0aNSq8fv7AgQPrrV271kdE5NSpUydCQkIKyrvmZ5995vPCCy/UExHRaDRy+vTp4w0bNvzH2jdq/PjxwUajUXn00UcT69ev/4/1w8LCQg4cOOBqy5qJiYlHfX19TSIiRqNRWrVq1eyPP/4wODo6qnv27DnZunXrvLLmz5gxw3/KlCnBIiITJkyI/fjjj2Nsub41Pv30U5/x48fXs9x/+umn4+bOnXulrDmW5yIwMLDgypUrJ64/X/x1ttbu3btPdujQIbesMRkZGZratWvfkZOToxERee211668++67cdasryhKm5J+rtPpVHd3d1ODBg3yunfvnv7CCy8kBQUFGW2pvSRBQUEtYmJiHK0d7+rqasrMzDxa3riq+Ay/+OKLgR9//HFta69RkgEDBiSvWbPmQsOGDQsfeeSRpBUrVvhNmDAheOvWrWdvZF2gpiPcBHtqISIHqroIAAAAAAAAAABQ7bQTkYNVXQRgrUWLFvkWv79w4UIfa4MRv/32m8t//vOfBrGxsf8IbKSlpenS0tJ0kZGRhtWrV/vMnj3bmJSUdMweNS9fvrwoJGM2m2X+/Pk+s2bNsirQYq1ff/3VZfv27R4ODg7q22+/HWvPtS10Op3MmzfvQqdOnZoVFBQoTz75ZN0DBw5EaTQlb0p0+vRpx9mzZweJiDRs2DBv9uzZlVLX9RYvXuw/adKkeHsEfOxtyZIlXpZgk4hIeHi4j7XhptIYjUYlJSVFl5KS4nrw4EHXr776KmDBggV/DRgwIOPGK7a/mvgZvt7UqVPjVq1a5RsREeERERFh6NatW05lXAeoCQg3AQAAAAAAAAAAACiXmpOnUZPTHNX8Ao3i5GhWfDwLFIPeXNV12VtWVpayceNGLxERg8FgzsnJ0WzYsME7Nzc32tnZWS1r7vHjx50efvjhJtnZ2RoRke7du6cNGDAgtVmzZvlOTk7mhIQEhyNHjjhv3brVfd++fW72qvns2bMO+/fvdytec3h4uN3DTTNnzqwtItK7d+/UkjpOXW///v2R1qzr5eX1tx1h7rrrrtynn346/vPPPw84fPiw63vvvef3xhtvJJY0d8yYMXVzc3M1Go1Gvvrqqwt6vb7M18hecnNzNW+//XbAvHnzLttjvTVr1pwJDg4ut0NXaGhouV3wvv32Wx+R/38vnD9/Xm9rOCY0NDRn4cKF5y3309PTtVFRUU7ffPON/7Fjx1zS09O1jz76aMPDhw9HNm3atNy6y+Pn51e4YcOG0+WN02q15a5VVZ/hl156KWHYsGEpJa27evVqz/fffz9IROTVV1+98sgjj6SVNM7SwUxEpEmTJgW9evVKXb9+vfeMGTMCu3XrRvcm3LIIN8Gein6h2L9/v9SufUMd9wAAAAAAAAAAQA0WGxsrYWFhlrslhhJQ/amqKuaT59xMu474mU9f8BK1WC5AUUQTUi9V27FVoqZ5w0xFUaquUDtatmyZlyXYMGvWrEsTJkyol5GRoV25cqXn6NGjU8ua+9prrwVZ5n7yyScXXnjhheTrhuT2798/Y/r06fExMTG6xYsXe9mj5nnz5vmoqio6nU6dOnVq9Kuvvlr33Llz+h07dhi6dOlil24vx44dc9q5c6eHiMjIkSOvf1wlateuXZnbyZXl/fffj1m/fr3nhQsX9O+8806dQYMGpV0fqPrss898du/e7S4i8p///CehR48eVm87diM8PT2NaWlpumXLlvlPnjw5vm7duje8/V9oaGieNdsPlufcuXMOltDNyy+/HDNnzpzaGRkZ2oULF/p269btkrXrGAwG8/Wv37333pv9zDPPpPTp06fBL7/84pWbm6uZNWtWwJIlS6xetzQODg7qjbxfiquqz3BQUJCxtE5ee/bsKSw2rtDaxzps2LCU9evXe2/fvt3j5MmTjs2bN7/h9whQExFugj0VpUhr164tderUqcpaAAAAAAAAAABA9WEqfwiqG/Nflw2FK3+uryal6UscoKpiPnXey3zqvJfi65nnMPSB85oGdWr8tkmW7d2aNGmSO378+OQ5c+YEXLhwQb906VKfsoIRRqNRtm3b5iFytetNCaGIvwkMDDSW1o3IVuHh4T4iIl26dEkfN25c8vTp0+tkZWVpFyxY4GOvcNNXX33lq6qqeHt7Gx966KFK34rM2dlZnTt37sUHHnggJDs7W/PEE0/U3bZtW1HnmitXrugmT54cLCISGBhYMGfOnCuVXZPFuHHj4t555506+fn5ypQpUwIWL14cfbOuXZ758+f7mM1m0Wq16lNPPZV85swZ/YoVK3zXrVvnlZ+fH+3k5HRDna00Go18+OGHV3755RcvEZGdO3farQOZvdTEz3BpBg4cmDFu3DhjWlqa7quvvvL95JNPYirzekB
"text/plain": [
"<Figure size 2000x1200 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(10,6),dpi=200)\n",
"sns.scatterplot(data=df,x='GDP ($ per capita)',y='Phones (per 1000)',hue='Region')\n",
"plt.legend(loc=(1.05,0.5))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**TASK: Create a scatterplot showing the relationship between GDP per Capita and Literacy (color the points by Region). What conclusions do you draw from this plot?**"
]
},
{
"cell_type": "code",
"execution_count": 715,
"metadata": {},
"outputs": [],
"source": [
"#CODE HERE"
]
},
{
"cell_type": "code",
"execution_count": 716,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:xlabel='GDP ($ per capita)', ylabel='Literacy (%)'>"
]
},
"execution_count": 716,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABqUAAAQICAYAAACEU9G6AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAB7CAAAewgFu0HU+AAEAAElEQVR4nOzdd3gc1dX48e+d2V31asmWbHXLveCCbWxjUwyYGnpJfiQhJORNLwRI7y/pBQgJabwhgQRM76Eb3Duu4C7Zsq1qyerS7sy9vz9UUNfuyrJlfD7P48ernbmzd3dnd2fumXuOMsYghBBCCCGEEEIIIYQQQgghxGCyTnYHhBBCCCGEEEIIIYQQQgghxIefBKWEEEIIIYQQQgghhBBCCCHEoJOglBBCCCGEEEIIIYQQQgghhBh0EpQSQgghhBBCCCGEEEIIIYQQg06CUkIIIYQQQgghhBBCCCGEEGLQSVBKCCGEEEIIIYQQQgghhBBCDDoJSgkhhBBCCCGEEEIIIYQQQohBJ0EpIYQQQgghhBBCCCGEEEIIMegkKCWEEEIIIYQQQgghhBBCCCEGnQSlhBBCCCGEEEIIIYQQQgghxKCToJQQQgghhBBCCCGEEEIIIYQYdBKUEkIIIYQQQgghhBBCCCGEEINOglJCCCGEEEIIIYQQQgghhBBi0ElQSgghhBBCCCGEEEIIIYQQQgw6CUoJIYQQQgghhBBCCCGEEEKIQSdBKSGEEEIIIYQQQgghhBBCCDHoPCe7A2JoUEpFAFNa/ywH3JPYHSGEEEIIIYQQQgghhBBCnFw2kNp6e5sxpnmgG5SglGgzBVh/sjshhBBCCCGEEEIIIYQQQoghZxawYaAbkfR9QgghhBBCCCGEEEIIIYQQYtDJTCnRprztxrp160hPTz+ZfRFCCCGEEEIIIYQQQgghxElUXFzM7Nmz2/4s72vdYElQSrRpryGVnp5ORkbGyeyLEEIIIYQQQgghhBBCCCGGDrf/Vfon6fuEEEIIIYQQQgghhBBCCCHEoJOglBBCCCGEEEIIIYQQQgghhBh0EpQSQgghhBBCCCGEEEIIIYQQg06CUkIIIYQQQgghhBBCCCGEEGLQSVBKCCGEEEIIIYQQQgghhBBCDDoJSgkhhBBCCCGEEEIIIYQQQohBJ0EpIYQQQgghhBBCCCGEEEIIMegkKCWEEEIIIYQQQgghhBBCCCEGnQSlhBBCCCGEEEIIIYQQQgghxKCToJQQQgghhBBCCCGEEEIIIYQYdBKUEkIIIYQQQgghhBBCCCGEEINOglJCCCGEEEIIIYQQQgghhBBi0ElQSgghhBBCCCGEEEIIIYQQQgw6CUoJIYQQQgghhBBCCCGEEEKIQSdBKSGEEEIIIYQQQgghhBBCCDHoJCglhBBCCCGEEEIIIYQQQgghBp0EpYQQQgghhBBCCCGEEEIIIcSgk6CUEEIIIYQQQgghhBBCCCGEGHQSlBJCCCGEEEIIIYQQQgghhBCDToJSQgghhBBCCCGEEEIIIYQQYtBJUEoIIYQQQgghhBBCCCGEEEIMOglKCSGEEEIIIYQQQgghhBBCiEEnQSkhhBBCCCGEEEIIIYQQQggx6CQoJYQQQgghhBBCCCGEEEIIIQadBKWEEEIIIYQQQgghhBBCCCHEoJOglBBCCCGEEEIIIYQQQgghhBh0EpQSQgghhBBCCCGEEEIIIYQQg06CUkIIIYQQQgghhBBCCCGEEGLQSVBKCCGEEEIIIYQQQgghhBBCDLrTNiillBqulLpcKfUTpdR/lVIVSinT+u+hMLZ3iVLqGaXUIaVUc+v/zyilLglhGx6l1OeUUsuVUuVKqUal1D6l1F+UUpNC7ZMQQgghhBBCCCGEEEIIIcRQ4TnZHTiJSo/HRpRSFvBX4NNdFo1q/XeVUurvwP8YY3Qf20kBXgZmdVmUB3wW+KRS6kvGmL8fj34LIYQQQgwVjnbxWDYArnaxW2872sFjnc6Hq+J467hPddzXOt7+MApoB2+X522MQRs9ZJ631i5KWSil0NrFan9vAtiW9yT3LnzaGAwGW1m4RmOrlutCO74nQgwlunXYwpJ9VgghxGmi07Fy629fx99DcfzJEUWLg8BO4KIw2t7NBwGpd4FfAfuA0cBdwHTgM0A58J2eNqCUsoFn+CAg9TTwN6ASmAN8DxgO/EUpddgY898w+imEEEKI48C4DspuOYQy2kFZ3W+L4LQFCDZU7OCxwldZV76NZh3Ao2ymJo/lhpyLOD99FsYYCU6JAXG0C8Bbxet5vPA1tlXtwTEukbaP2SlTuCl3MXNSp3zoAqGOdjEYXj28iicKX2dndUHr845g/vAzuCn3YmYMm9ApMHyiudrBtjwcqdjKpj2PUnBkBQG3CcvyMDxxPNPG3MCE7MUoFNYQf29co7FQKKXag3+WUpQ1VvFYwSs8deANGpwmMmPSuCrrPK7JXkS0J1IGO8SQ0DYIt7vmII/t/y9LS9ZT5zRiK4vc2FFcm30BV2Seg8/2tgeqhBBCiFOZazR+189zB9/m6YNvUlh3BNdoYj3RnJ8+m4/lXcKY+KxOF2mI40MZY052H04KpdSPgfXAemNMqVIqByhoXfxPY8wtQWxjLLCDluDeBmChMaaxw/Jo4B3gTMABJhhj9vawnVuBB1v//JMx5otdlucDG4F4YG/rdpzgn23/lFIZQBFAUVERGRkZx3PzQggxJBlXo2yr9bZB2arb/WFvW7vQesW30S6qdbDPuAGUPbhXfBttQNHy2B2fl6NRHjmQCpdxHVDQ8P4b1G16guYj20E7KE8EkTlziJ31UaJyZncKWoneOdrhSEM5X1//GwrrjpARPYLRcRnEeKNodJopqi9lb+1BUiIS+cXMrzI1acyQmdEhTi2udtlatYdvbbyXiuZjva6XEzuSe2bfRXrUsA9FYMo1mtVlW/nh5j9xzF/b63rj4nO4Z/adJEfEn/Dn7eoADU1VPLP8a5RV7ex1vUhfApee9VNy0udiqaH3PdAW1NtauYdHC/7L8tJNNLrNeJRNfnwW12dfwMUZ82l2A9y54XdsPPo+ABGWl8+Nu4FP5F9+kp+BON052qXRbeLO9b9n/dEdva4XbUdy5+RbuCJzIUqpE9hDIYQQ4vgyxvDswaX8dse/aHSbe11vTsoUfnXm14i0I07aRVwn26FDh8jMzGz7M9MYc2ig2zxtg1JdhRmU+hPw+dY/5xpj1vSwzlnA6tY/uwWcWtd5D5hAy8yoTGNMQw/rfAv4eeufNxhjnuivf6GQoJQQLRxt8Fiq2+2Aq/HaFtq47YMhp3o6mdOZcTRYCr25EuedEvS+WnAMeBTW2AQ856VhTU4EbUIOTrXNlmk+vJXaDY/RuGcZJtAIysY7fAxxM64jZvKlYHnaA1XH7Xm1BqDcvTW4S0twt1WBv+W5qsxoPOekYc9OAaXaA1UiOEa7NB/aTMWz30bXH+11PW9KHinX/R5PfJoEpvrgaIfDDeXcuvKHTE8ex43ZFzJr+FQAtOtgtb52O6v28ljh6ywtWc9vZn2D6cnjPhTBAnHiONpl09H3+cq6XxLQ/V/TleCN5Z8Lfkp6VMopva852mV56Sa+ufEe3N4ziLdLiUji4QX/S3JEwgk72dbaoaG5ikdeu5m6xvJ+11fK5op5vyB/1DlDasaUox2qA3Xcvu43bD+2r9f14rwxfG/qbZyTNpOvr/s1q8u3ti/7zJir+dy462WQX5wUrnZpdJv55IrvU1h3JKg2d02+hetzLpRZfkIIIU5J2mge3f8Kv3vv4aDWHx2XwUNn/5QIy4dtnX6/fRKUGkShBqVUyxnDIWAksNMYM6GPdXcC44DDtLxxpsOyscCu1j//bIz5fA+bQCmVBhS3/vmoMeZj/T2nUEhQSpzu2gJQ60vqeWJ3JetK6/G7Bo8FE5OjuCY/kfMz42hoPsZzy7+CqwOcMfpaJud9BNvytdc9aONqg9U6U8XVBrstuKUNXksGHFrqWIBtKXTrV6KlVKdA4KA9tmvQRfX4/7wLjvl7XU+lROD7wnjUiMigA1PGddBNNZQ/+XX8R7b3vu2IWJIv/T7R485
"text/plain": [
"<Figure size 2000x1200 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(10,6),dpi=200)\n",
"sns.scatterplot(data=df,x='GDP ($ per capita)',y='Literacy (%)',hue='Region')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**TASK: Create a Heatmap of the Correlation between columns in the DataFrame.**"
]
},
{
"cell_type": "code",
"execution_count": 717,
"metadata": {},
"outputs": [],
"source": [
"#CODE HERE"
]
},
{
"cell_type": "code",
"execution_count": 718,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:>"
]
},
"execution_count": 718,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAgcAAAGXCAYAAAAnL/PlAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAACE8klEQVR4nO2dd7xdRdWGnzehhRJ6B2mCSA0QOirVLqAgiPgJiqKfBVDRD0UBASsWBFSkCShKlaI06VVKgJDQQYpSFEE6IaS83x8zJ3ffk1P22ffc3Lae/Pbvnj171sycc2/Onj2z1nplmyAIgiAIghqjBnoAQRAEQRAMLmJyEARBEARBL2JyEARBEARBL2JyEARBEARBL2JyEARBEARBL2JyEARBEARBL2JyEARBEAQDjKRTJD0r6Z4m1yXpGEmPSJokacPCtb0kPZyPvboxnpgcBEEQBMHAcyrw3hbX3wesno99gV8DSFoMOBTYFNgEOFTSon0dTEwOgiAIgmCAsX098N8WVXYCTnfiFmARScsC7wGusP1f2y8AV9B6klGKufraQBAMFqY992jldJ9Tjzqwcr/XnzpPJbvXR42u3OeiM6dXtp1bMyvbvjRAXxmj+pDIde1ln6tse+e/lqxs+/Loas9eS02v/rudpurPe3O7+t/FvFT/BS0+35TKtg++uVBl292fOUOVjTOdfOfMs+RqnyM98dc4wfYJHXS3PPDPwvmTuaxZeZ+IyUEQBEEQ9DN5ItDJZGBAiclBEARBEFRhxrQ52dtTwIqF8xVy2VPA1nXl1/a1s/A5GAFImiFpoqR7JJ0jaf4ut3+tpPFt6hxQ7FfSJZIW6eY4giAI5igzZ5Y/+s5FwCdz1MJmwEu2nwEuB94tadHsiPjuXNYnYuVgZDDF9jgASWcAnwd+NofHcADwe+B1ANvvn8P9B0EQdBX3wU+jHkl/JK0ALCHpSVIEwtypHx8PXAK8H3iE9D36qXztv5KOAG7PTR1uu5VjYylicjDyuAFYL4e/nAKsSvpD29f2JEmHAasBbwWWAH5s+0RJWwMH2v4ggKTjgAm2Ty02LunXwMbAGOBc24dK2g9YDrhG0nO2t5H0ODDe9nOSvgp8Ojdxku2jJa0MXArcCGxBWjrbyXZ176UgCIJu0p0VAQBs79HmuoEvNrl2Cun7vGvEtsIIQtJcpFjZycB3gbtsrwd8Czi9UHU9YFtgc+AQSct10M3BtsfnNt4laT3bxwBPA9vY3qZuTBuRZsCbApsBn5W0Qb68OvBL22sDLwK7NHhP+0qaIGnCSaf/sYNhBkEQ9BHPLH8MMWLlYGQwRtLE/PoG4GTgVvLN1vbVkhaXNDbXuTA/oU+RdA0pscaLJfvaTdK+pL+tZYG1gEkt6m8FnG/7NQBJfwLeQdpfe8x2bdx3ACvXGxc9gPsSyhgEQdAxM2cM9Aj6jZgcjAxm+RzUkFqG+NbfZA1Mp/dK03z1RpJWAQ4ENrb9gqRTG9XrgKmF1zNIWxVBEASDgxnVc1IMdmJbYeRyA7AnQPYneM72y/naTpLmk7Q4yUHmduAJYC1J8+Yog+0atDkWeA14SdLSpC2MGq8AjTKW3ADsLGl+SQsAH85lQRAEgxp7ZuljqBErByOXw4BTJE0iOSQWxTomAdeQHBKPsP00gKSzgXuAx4C76hu0fbeku4AHSBm7bipcPgG4TNLTRb8D23fmFYbbctFJtu/KDolBEASDly46JA42YnIwArC9YIOy/wI7NzGZZPuTDWy+AXyjQfnWhdd7NxnDscCxhfOVC69/Rl1ope3HgXUK5z9pMtYgCIKBYQiuCJQlJgfBsKEv+gjzfr363OP1079TyW5qa7+PlowZVX2v87WZ1f/bLzt39UjSqdOr9zt6dPUv4dv/vVRl2xfnqv47esf81ULNb3ltscp9Ljizuk/uElT/m3qOavoiAEv34Xc7rwfYBzkcEoORgu3DBnoMQRAEQ4Jh7JAYk4MgCIIgqMIw3laIaIUhhKSdJVnSmv3YhyRdXch5MMeQtKOkgxqUr5udFoMgCAYPc1ZbYY4Sk4OhxR6kdMIN02zmDIh95f3A3YWwxjmG7Yts/7BB+WRgBUlvmdNjCoIgaIY9o/Qx1IjJwRBB0oKkbIL7AB8rlG8t6QZJFwH3SRot6ShJt0uaJOlzNXtJV0m6U9JkSTs16WpP4MJss4CkiyXdnRUdd8/l75X0QG7rGEl/aTP2lXP9UyU9JOkMSdtLuknSw5I2yfX2zpoNjfhz8X0HQRAMOMM4fXJMDoYOOwGX2X4IeD5rEtTYENjf9hqkycNLtjcmCSB9NmcufAP4sO0NgW2An6pxmsQtSamKAd4LPG17fdvrkPIUzAecCHwI2AhYpuT43wr8FFgzHx8nTXYOJGk7tGMCKa1yL4raCqdMfKzkUIIgCLpAbCsEg4A9gDPz6zPpvbVwm+3anfHdJM3viST9hMVJAkYCvp+THl0JLA8s3aCfxWy/kl9PBnaQ9CNJ77D9EunG/pjth7NK2O9Ljv8x25OdUoXdC1yV7SfTQDOhAc+SlB17YfsE2+Ntj//0uFVKDiUIgqALzJhW/hhiRLTCECDLK28LrCvJwGjAkr6eq7xWrA582fbldW3sDSwJbGR7WpZMbqR7MF3SKNszbT8kaUOSH8KRkq4iCSJVoaiTMLNwPpNyf4fzASHXHATB4GEIbheUJVYOhga7Ar+zvZLtlW2vSEphPNsyO3A58L+S5gaQtEbWLFgYeDZPDLYBVmrS14PAqtl2OeB1278HjiJtXzwArCxptVy/pQZ5F1mDlLo5CIJgcBDbCsEAswdwfl3ZeTS+MZ8E3AfcKeke4DekJ/MzgPGSJgOfJN3kG3ExSWwJYF3gtrxFcShwpO03gH2BiyXdSVruB0DSeEkndfzuGpDDGg8vFG2TxxYEQTA4GMYOibGtMAQoChUVyo4pnF5bKJ9JcvBr5OS3eYnuTgJOJwkgXU5aiajv+zKS70FN0fHAXD4B+EyD+o/TWydh70bXbJ8KnJpfX0TewpA0LzAeOKDE+IMgCOYMQ3BFoCwxOQh6YfsZSSdKGjsQuQ6a8BbgINstc5Vef2r1/O5V9REAdpp8RCW753bep3KfU16au7LtVS9U1xpYY6nnK9u+8Pz8lW2XX+OlyrZP3LNCZdvlplWPT399SrXf0Vumv1m5z4fmqf5/gOnzVjZ9ow86IX97c5HKth9a48nKtl0hJgfBSML22R3UvZbCykV/YPth4OH+7CMIgqBTPASjEMoSPgdBEARBUIUu+hzk5HIPSnqkSRr5n0uamI+HJL1YuDajcK1qRFkvhvTKgaQZpDj5uYD7gb1sv97F9q8FliWF3c1Dyg/wbdsvdrGP8cAnbe+X9+/ftH1zh20cAPzX9undGtdAIGlHYK36FMqS1gW+VvRVCIIgGHC6tK0gaTTwS2AH4EngdkkX2b6vVsf2Vwr1vwxsUGhiiu1xXRlMZqivHEyxPS5n73sT+Hw/9LGn7fWA9UiThAu72bjtCbb3y6dbA1t0Yp/1FD4N/KGvY+mSNkNlQlshCIIhRfdWDjYBHrH9qO03SYnumqW4hxSp9scuvYuGDPXJQZEbgLdKWkzSBVlX4BZJ6wFIOkzS7yT9Lefz/2wnjedf2DeAt0haP7f5CUm35aWc3+TZH5JelfS9rElwi6Slc/lHs0bB3ZKuz2VbS/qLpJVJk5uv5PbeIemxQr6CscXzAtsCd9ac9SRdK+kXuY17CroFC0g6JY/3LmVthaxncJGkq4Grig0rtBWCIAia0708B8sD/yycP5nLZkPSSsAqwNWF4vmU0sjfImnnPryjWQyLyUF+4n0faYvhu8Bd+Wn/W6SwvBrrkW6mmwOH5CQ/pXGS1robWFPS24HdgS3zcs4MkmgRwALALbbXB64HahORQ4D35PId69p+HDge+HleDbmB5Oj3gVzlY8CfbNd7wBS1EGrMn8f0BeCUXHYwcLXtTUg5A47KyZEgJTfa1fa76toZUtoKl075e8mhBEEQdIEZ00sfxe+qfOxbsdePAee6t9TjSrbHk75Xj1ZPkrrKDPXJwZicoGcC8A/
"text/plain": [
"<Figure size 432x288 with 2 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.heatmap(df.corr())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**TASK: Seaborn can auto perform hierarchal clustering through the clustermap() function. Create a clustermap of the correlations between each column with this function.**"
]
},
{
"cell_type": "code",
"execution_count": 719,
"metadata": {},
"outputs": [],
"source": [
"# CODE HERE"
]
},
{
"cell_type": "code",
"execution_count": 720,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<seaborn.matrix.ClusterGrid at 0x194e3dc53d0>"
]
},
"execution_count": 720,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAsgAAALJCAYAAACp99XTAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAACYgElEQVR4nOzdd7hcVdn+8e+dAkkIHcRQJEqVGsihFwOiLypVUAT8YbDECqKCLypCRFSKWEAFA9JR6RBAmkAwgEBOQir9pUiVDiGN5OT5/bHXwM5kTpkzs+e0+3Nd58qetdd+1prJJOeZNWuvpYjAzMzMzMwy/bq6A2ZmZmZm3YkTZDMzMzOzHCfIZmZmZmY5TpDNzMzMzHKcIJuZmZmZ5ThBNjMzMzPL6dUJsqTzJL0saWYr5yXpDElPSJouaetG99HMzMzMupdenSADFwB7tnH+U8AG6WcMcFYD+mRmZmZm3VivTpAj4l/A621U2Re4KDL3AStJGtaY3pmZmZlZd9SrE+QOWAt4Nvf4uVRmZmZmZn3UgGoqL3z1yW61L/Uyq6/3dbKpESXjImJcV/XHzMzMzHq+qhJkFrcU1I3OSclwLQnx88A6ucdrpzKrQlNT07HAoK7uh1kPNr+5ufnkru6EmZllqkuQWxYV1I0uMx74jqS/A9sBb0XEi13cp55oUHNz89iu7oRZT9XU1DS2q/tgZmbvqypBjh6WIEv6GzAKWE3Sc8AJwECAiDgb+AfwaeAJYC5weNf01MzMzMy6iypHkBcW1I1iRMTB7ZwP4NsN6o6ZmZmZ9QB9fYqFmZmZmdkSevUUCzMzMzOzavXqKRZmZmZmZtXyFAszMzMzsxwnyGZmZmZmOdXNQV7sKRZmZmZm1rt5BNnMzMzMLMc36ZmZmZmZ5XgE2czMzMwsp7oEeZETZDMzMzPr3arcKMRTLMzMzMysd/MUCzMzMzOzHCfIZmZmZmY5TpDNzMzMzHKcIJuZmZmZ5ThBNjMzMzPLcYJsZmZmZpbjdZDNzMzMzHKqHEFuKagbZmZmZmbdg6dYmJmZmZnleATZzMzMzCzHc5DNrFBNTU3HAoO6uh/d3PCmpqaxXd2Jbm5+c3PzyV3dCTPrG6pKkMMjyGZWvUHNzc1ju7oT1rP5A4SZNZLnIJuZmZmZ5VQ5xcIjyGZmZmbWu/kmPTMzMzOznH5V1V7U0r1+2iFpT0mPSnpC0rEVzo+W9Iqkqennq1W9HmZmZmbW6/TaEWRJ/YE/Ap8AngMmSRofEQ+VVb0sIr7T8A6amZmZWbdU3SoWPWsO8rbAExHxJICkvwP7AuUJspmZmZnZe3rtCDKwFvBs7vFzwHYV6h0gaVfgMeB7EfFshTpmZj1OL1uDujetFe01nc26uR69ioWkMcCYXNG4iBhXRYjrgb9FxAJJXwcuBHavZx/NzLqQ16DuhnpRom/Wa/XoEeSUDLeWED8PrJN7vHYqy1//Wu7hucCpde2gmZmZmfU4vXkO8iRgA0kfJkuMvwAckq8gaVhEvJge7gM83NgumpmZmVl306NHkNsSEYskfQe4BegPnBcRsySdCDRHxHjgSEn7AIuA14HRXdZhMzMzM+sWqhxBXlxUPwoREf8A/lFWdnzu+EfAjxrdL7Na9bCbr3rKzVW+ccrMzIAefpOeWR/mm6/qrIck8VYH3eADZld/aPSHQbN2VJcgL46CumFmZtYwffoDpj8MmrWvuikWLT1rioWZmZmZWbWqnGLhBNnMzMzMerdefZOemZmZmVm1qkyQPQfZzMzMzHq3KqdYOEE2MzMzs97NI8hmZt1MHZchq8dyYl4SzMz6HCfIZmbdT7dZhsxLgnV/nfhAVe0HJ39Isj6nygS5qG6YmZlZJxX6gcofkqwvcoJsZmY9gqeemFmjVJUgL3aCbGZmXcdTT8ysIarcSU9F9cPMzMzMrFuocgTZCbKZmZmZ9W7VJcgeQTYzMzOzXs4jyGZmZmZmOR5BNrNeyWvDmplZZzlBNrPeymvDmplZp1SZIPcrqh/WIHVcRzSvHmuK5nlkzszMzLpMVQlyi0eQe4Nus45oazwyZ1YcTz0xM2ufR5DNzPoWTz0xM2uH5yCbmZmZmeVUN8VisUeQzczMzKx3qzJB9giymZmZmfVu1U2xcIJsZmZmZr2cp1iYmZmZmeVUlfG2hLrVT3sk7SnpUUlPSDq2wvllJV2Wzt8vaXg1r4eZmZmZ9T69dgRZUn/gj8AngOeASZLGR8RDuWpfAd6IiPUlfQE4BTio8b01MzMzs+6iugSZHjUHeVvgiYh4EkDS34F9gXyCvC8wNh1fCfxBkiIiGtlRMzMzM+s+qkqQF3VgWkM3shbwbO7xc8B2rdWJiEWS3gJWBV5tSA+tzyhgi+96b+8N3gHNzMwM6OEjyJLGAGNyReMiYlxX9cesDd7i28zMrIfo0QlySoZbS4ifB9bJPV47lVWq85ykAcCKwGv17qdZkeo4Ol2PUWmPQluPVeW/pWr/vfjfhlkP0qMT5HZMAjaQ9GGyRPgLwCFldcYDXwL+DRwI3OH5x9YDdZvRaY9CWw9X2L8l/9sw61mqm4OsnpMgpznF3wFuAfoD50XELEknAs0RMR74C3CxpCeA18mSaDMzMzPrw6ocQe5ZIuIfwD/Kyo7PHc8HPtfofpmZmZlZ91VdgtyDRpDNzMzMzDqjyikWRXXDzMzMzKx76M036VkXqNOKCvVa49d3jZuZmVnVPIJs9eYVFczMzKxHq3IOclHdMDMzMzPrHqobQS6qF2ZmZmZm3YRHkM3MzMzMcqpKkBcX1QszMzMzs27CI8hmZmZmZjmeg2xmZmZmluMRZDMzMzOznCo3CjHrOp3YhKSaDUe8qYiZmZkBVU+xiKL6YdYRhW1C4k1FWlflB5Nqd0H0BxMzM+t2PIJsZu3xBxMzM+tTqtxq2iPIZmZmZta7eQTZzMzMzCynygTZI8hmZmZm1rv5Jj0zMzMzsxxPsTAzMzMzy/EUCzMzMzOzHCfIZmZmZmY5noNsZmZmZpbjEWQzMzMzsxwnyGZmZmZmOdUlyOEE2czMzMx6typHkBcX1Q8zMzMzs26hz96kJ2kV4DJgOPA08PmIeKNCvRZgRnr4n4jYp1F9NDMzM7PGU1QxbeKz6+7TrTLkq58Zr85eK+lU4PWIOFnSscDKEfG/Feq9ExFDa+ln0Ra++mQhfy9zjxlTRNj3LPPx7QqLvfCuSYXFBvjubcsXFnsQ/QuLvf7iZQqLDTBbxf0XsefCeYXFBpjbUtV4QVWmLFvc6z58YbH/Lb88oNP/zbZrs4ULCov9ThT37whgRS0qLHbR39Uu06+4FrZ74eri3jBmDVTlHOReNcViX2BUOr4QmAAslSCbmZmZWd/Sl1exWCMiXkzHLwFrtFJvkKRmYBFwckRc24jOmZmZmVnX6NEjyJLGAPl5AOMiYlzu/D+BD1a49Cf5BxERUqvfDa8bEc9L+ghwh6QZEfF/tfbdzMzMzLqnHj2CnJLhcW2c36O1c5L+K2lYRLwoaRjwcisxnk9/PilpArAV4ATZzMzMrJfq0SPINRoPfAk4Of15XXkFSSsDcyNigaTVgJ2AUxvaSzMzMzNrqL68DvLJwOWSvgI8A3weQFIT8I2I+CrwUeDPkhYD/cjmID/UVR02MzMzs+L12Z30IuI14OMVypuBr6bje4HNG9w1MzMzM+tC/aqpvIjF3erHzMzMeg5JLZKmSpolaZqkH0iqKhfJxVpJ0rdyj0dJuqHG/u0naZNaYljv0JfnIJuZmVljzYuIEQCSPgD8FVgBOKETsVYCvgX8qZqLJPWPiJZWTu8H3AB4OmUfV9WntsWxuFv9mJmZWc8UES+TLdX6HWX6SzpN0iRJ0yV9HUDSUEm3S5oiaYakfVOIk4H10oj0aalsqKQrJT0i6VJJSjGelnSKpCnA5yR9LbUzTdJVkoZI2hHYBzgtxVwv/dwsabKkiZI2buyrZF3FI8hmZmbWJdISqv2BD5DtcPtWRGwjaVngHkm3As8C+0fE22lFqfskjQeOBTbLjUiPIluKdVPgBeAestWn7k7NvRYRW6e6q0bEOen4JOA
"text/plain": [
"<Figure size 720x720 with 4 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.clustermap(df.corr())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"-----"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data Preparation and Model Discovery\n",
"\n",
"Let's now prepare our data for Kmeans Clustering!\n",
"\n",
"### Missing Data\n",
"\n",
"**TASK: Report the number of missing elements per column.**"
]
},
{
"cell_type": "code",
"execution_count": 721,
"metadata": {},
"outputs": [],
"source": [
"#CODE HERE"
]
},
{
"cell_type": "code",
"execution_count": 722,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Country 0\n",
"Region 0\n",
"Population 0\n",
"Area (sq. mi.) 0\n",
"Pop. Density (per sq. mi.) 0\n",
"Coastline (coast/area ratio) 0\n",
"Net migration 3\n",
"Infant mortality (per 1000 births) 3\n",
"GDP ($ per capita) 1\n",
"Literacy (%) 18\n",
"Phones (per 1000) 4\n",
"Arable (%) 2\n",
"Crops (%) 2\n",
"Other (%) 2\n",
"Climate 22\n",
"Birthrate 3\n",
"Deathrate 4\n",
"Agriculture 15\n",
"Industry 16\n",
"Service 15\n",
"dtype: int64"
]
},
"execution_count": 722,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.isnull().sum()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**TASK: What countries have NaN for Agriculture? What is the main aspect of these countries?**"
]
},
{
"cell_type": "code",
"execution_count": 723,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"3 American Samoa\n",
"4 Andorra\n",
"78 Gibraltar\n",
"80 Greenland\n",
"83 Guam\n",
"134 Mayotte\n",
"140 Montserrat\n",
"144 Nauru\n",
"153 N. Mariana Islands\n",
"171 Saint Helena\n",
"174 St Pierre & Miquelon\n",
"177 San Marino\n",
"208 Turks & Caicos Is\n",
"221 Wallis and Futuna\n",
"223 Western Sahara\n",
"Name: Country, dtype: object"
]
},
"execution_count": 723,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[df['Agriculture'].isnull()]['Country']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**TASK: You should have noticed most of these countries are tiny islands, with the exception of Greenland and Western Sahara. Go ahead and fill any of these countries missing NaN values with 0, since they are so small or essentially non-existant. There should be 15 countries in total you do this for. For a hint on how to do this, recall you can do the following:**\n",
"\n",
" df[df['feature'].isnull()]\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 724,
"metadata": {},
"outputs": [],
"source": [
"# REMOVAL OF TINY ISLANDS\n",
"df[df['Agriculture'].isnull()] = df[df['Agriculture'].isnull()].fillna(0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**TASK: Now check to see what is still missing by counting number of missing elements again per feature:**"
]
},
{
"cell_type": "code",
"execution_count": 725,
"metadata": {},
"outputs": [],
"source": [
"#CODE HERE"
]
},
{
"cell_type": "code",
"execution_count": 726,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Country 0\n",
"Region 0\n",
"Population 0\n",
"Area (sq. mi.) 0\n",
"Pop. Density (per sq. mi.) 0\n",
"Coastline (coast/area ratio) 0\n",
"Net migration 1\n",
"Infant mortality (per 1000 births) 1\n",
"GDP ($ per capita) 0\n",
"Literacy (%) 13\n",
"Phones (per 1000) 2\n",
"Arable (%) 1\n",
"Crops (%) 1\n",
"Other (%) 1\n",
"Climate 18\n",
"Birthrate 1\n",
"Deathrate 2\n",
"Agriculture 0\n",
"Industry 1\n",
"Service 1\n",
"dtype: int64"
]
},
"execution_count": 726,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.isnull().sum()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**TASK: Notice climate is missing for a few countries, but not the Region! Let's use this to our advantage. Fill in the missing Climate values based on the mean climate value for its region.**\n",
"\n",
"Hints on how to do this: https://stackoverflow.com/questions/19966018/pandas-filling-missing-values-by-mean-in-each-group\n"
]
},
{
"cell_type": "code",
"execution_count": 727,
"metadata": {},
"outputs": [],
"source": [
"# CODE HERE"
]
},
{
"cell_type": "code",
"execution_count": 728,
"metadata": {},
"outputs": [],
"source": [
"# https://stackoverflow.com/questions/19966018/pandas-filling-missing-values-by-mean-in-each-group\n",
"df['Climate'] = df['Climate'].fillna(df.groupby('Region')['Climate'].transform('mean'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**TASK: Check again on many elements are missing:**"
]
},
{
"cell_type": "code",
"execution_count": 729,
"metadata": {},
"outputs": [],
"source": [
"#CODE HERE"
]
},
{
"cell_type": "code",
"execution_count": 730,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Country 0\n",
"Region 0\n",
"Population 0\n",
"Area (sq. mi.) 0\n",
"Pop. Density (per sq. mi.) 0\n",
"Coastline (coast/area ratio) 0\n",
"Net migration 1\n",
"Infant mortality (per 1000 births) 1\n",
"GDP ($ per capita) 0\n",
"Literacy (%) 13\n",
"Phones (per 1000) 2\n",
"Arable (%) 1\n",
"Crops (%) 1\n",
"Other (%) 1\n",
"Climate 0\n",
"Birthrate 1\n",
"Deathrate 2\n",
"Agriculture 0\n",
"Industry 1\n",
"Service 1\n",
"dtype: int64"
]
},
"execution_count": 730,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.isnull().sum()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**TASK: It looks like Literacy percentage is missing. Use the same tactic as we did with Climate missing values and fill in any missing Literacy % values with the mean Literacy % of the Region.**"
]
},
{
"cell_type": "code",
"execution_count": 731,
"metadata": {},
"outputs": [],
"source": [
"#CODE HERE"
]
},
{
"cell_type": "code",
"execution_count": 732,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Country</th>\n",
" <th>Region</th>\n",
" <th>Population</th>\n",
" <th>Area (sq. mi.)</th>\n",
" <th>Pop. Density (per sq. mi.)</th>\n",
" <th>Coastline (coast/area ratio)</th>\n",
" <th>Net migration</th>\n",
" <th>Infant mortality (per 1000 births)</th>\n",
" <th>GDP ($ per capita)</th>\n",
" <th>Literacy (%)</th>\n",
" <th>Phones (per 1000)</th>\n",
" <th>Arable (%)</th>\n",
" <th>Crops (%)</th>\n",
" <th>Other (%)</th>\n",
" <th>Climate</th>\n",
" <th>Birthrate</th>\n",
" <th>Deathrate</th>\n",
" <th>Agriculture</th>\n",
" <th>Industry</th>\n",
" <th>Service</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>Bosnia &amp; Herzegovina</td>\n",
" <td>EASTERN EUROPE</td>\n",
" <td>4498976</td>\n",
" <td>51129</td>\n",
" <td>88.0</td>\n",
" <td>0.04</td>\n",
" <td>0.31</td>\n",
" <td>21.05</td>\n",
" <td>6100.0</td>\n",
" <td>NaN</td>\n",
" <td>215.4</td>\n",
" <td>13.60</td>\n",
" <td>2.96</td>\n",
" <td>83.44</td>\n",
" <td>4.000000</td>\n",
" <td>8.77</td>\n",
" <td>8.27</td>\n",
" <td>0.142</td>\n",
" <td>0.308</td>\n",
" <td>0.550</td>\n",
" </tr>\n",
" <tr>\n",
" <th>66</th>\n",
" <td>Faroe Islands</td>\n",
" <td>WESTERN EUROPE</td>\n",
" <td>47246</td>\n",
" <td>1399</td>\n",
" <td>33.8</td>\n",
" <td>79.84</td>\n",
" <td>1.41</td>\n",
" <td>6.24</td>\n",
" <td>22000.0</td>\n",
" <td>NaN</td>\n",
" <td>503.8</td>\n",
" <td>2.14</td>\n",
" <td>0.00</td>\n",
" <td>97.86</td>\n",
" <td>2.826087</td>\n",
" <td>14.05</td>\n",
" <td>8.70</td>\n",
" <td>0.270</td>\n",
" <td>0.110</td>\n",
" <td>0.620</td>\n",
" </tr>\n",
" <tr>\n",
" <th>74</th>\n",
" <td>Gaza Strip</td>\n",
" <td>NEAR EAST</td>\n",
" <td>1428757</td>\n",
" <td>360</td>\n",
" <td>3968.8</td>\n",
" <td>11.11</td>\n",
" <td>1.60</td>\n",
" <td>22.93</td>\n",
" <td>600.0</td>\n",
" <td>NaN</td>\n",
" <td>244.3</td>\n",
" <td>28.95</td>\n",
" <td>21.05</td>\n",
" <td>50.00</td>\n",
" <td>3.000000</td>\n",
" <td>39.45</td>\n",
" <td>3.80</td>\n",
" <td>0.030</td>\n",
" <td>0.283</td>\n",
" <td>0.687</td>\n",
" </tr>\n",
" <tr>\n",
" <th>85</th>\n",
" <td>Guernsey</td>\n",
" <td>WESTERN EUROPE</td>\n",
" <td>65409</td>\n",
" <td>78</td>\n",
" <td>838.6</td>\n",
" <td>64.10</td>\n",
" <td>3.84</td>\n",
" <td>4.71</td>\n",
" <td>20000.0</td>\n",
" <td>NaN</td>\n",
" <td>842.4</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>3.000000</td>\n",
" <td>8.81</td>\n",
" <td>10.01</td>\n",
" <td>0.030</td>\n",
" <td>0.100</td>\n",
" <td>0.870</td>\n",
" </tr>\n",
" <tr>\n",
" <th>99</th>\n",
" <td>Isle of Man</td>\n",
" <td>WESTERN EUROPE</td>\n",
" <td>75441</td>\n",
" <td>572</td>\n",
" <td>131.9</td>\n",
" <td>27.97</td>\n",
" <td>5.36</td>\n",
" <td>5.93</td>\n",
" <td>21000.0</td>\n",
" <td>NaN</td>\n",
" <td>676.0</td>\n",
" <td>9.00</td>\n",
" <td>0.00</td>\n",
" <td>91.00</td>\n",
" <td>3.000000</td>\n",
" <td>11.05</td>\n",
" <td>11.19</td>\n",
" <td>0.010</td>\n",
" <td>0.130</td>\n",
" <td>0.860</td>\n",
" </tr>\n",
" <tr>\n",
" <th>104</th>\n",
" <td>Jersey</td>\n",
" <td>WESTERN EUROPE</td>\n",
" <td>91084</td>\n",
" <td>116</td>\n",
" <td>785.2</td>\n",
" <td>60.34</td>\n",
" <td>2.76</td>\n",
" <td>5.24</td>\n",
" <td>24800.0</td>\n",
" <td>NaN</td>\n",
" <td>811.3</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>100.00</td>\n",
" <td>3.000000</td>\n",
" <td>9.30</td>\n",
" <td>9.28</td>\n",
" <td>0.050</td>\n",
" <td>0.020</td>\n",
" <td>0.930</td>\n",
" </tr>\n",
" <tr>\n",
" <th>108</th>\n",
" <td>Kiribati</td>\n",
" <td>OCEANIA</td>\n",
" <td>105432</td>\n",
" <td>811</td>\n",
" <td>130.0</td>\n",
" <td>140.94</td>\n",
" <td>0.00</td>\n",
" <td>48.52</td>\n",
" <td>800.0</td>\n",
" <td>NaN</td>\n",
" <td>42.7</td>\n",
" <td>2.74</td>\n",
" <td>50.68</td>\n",
" <td>46.58</td>\n",
" <td>2.000000</td>\n",
" <td>30.65</td>\n",
" <td>8.26</td>\n",
" <td>0.089</td>\n",
" <td>0.242</td>\n",
" <td>0.668</td>\n",
" </tr>\n",
" <tr>\n",
" <th>123</th>\n",
" <td>Macedonia</td>\n",
" <td>EASTERN EUROPE</td>\n",
" <td>2050554</td>\n",
" <td>25333</td>\n",
" <td>80.9</td>\n",
" <td>0.00</td>\n",
" <td>-1.45</td>\n",
" <td>10.09</td>\n",
" <td>6700.0</td>\n",
" <td>NaN</td>\n",
" <td>260.0</td>\n",
" <td>22.26</td>\n",
" <td>1.81</td>\n",
" <td>75.93</td>\n",
" <td>3.000000</td>\n",
" <td>12.02</td>\n",
" <td>8.77</td>\n",
" <td>0.118</td>\n",
" <td>0.319</td>\n",
" <td>0.563</td>\n",
" </tr>\n",
" <tr>\n",
" <th>185</th>\n",
" <td>Slovakia</td>\n",
" <td>EASTERN EUROPE</td>\n",
" <td>5439448</td>\n",
" <td>48845</td>\n",
" <td>111.4</td>\n",
" <td>0.00</td>\n",
" <td>0.30</td>\n",
" <td>7.41</td>\n",
" <td>13300.0</td>\n",
" <td>NaN</td>\n",
" <td>220.1</td>\n",
" <td>30.16</td>\n",
" <td>2.62</td>\n",
" <td>67.22</td>\n",
" <td>3.000000</td>\n",
" <td>10.65</td>\n",
" <td>9.45</td>\n",
" <td>0.035</td>\n",
" <td>0.294</td>\n",
" <td>0.672</td>\n",
" </tr>\n",
" <tr>\n",
" <th>187</th>\n",
" <td>Solomon Islands</td>\n",
" <td>OCEANIA</td>\n",
" <td>552438</td>\n",
" <td>28450</td>\n",
" <td>19.4</td>\n",
" <td>18.67</td>\n",
" <td>0.00</td>\n",
" <td>21.29</td>\n",
" <td>1700.0</td>\n",
" <td>NaN</td>\n",
" <td>13.4</td>\n",
" <td>0.64</td>\n",
" <td>2.00</td>\n",
" <td>97.36</td>\n",
" <td>2.000000</td>\n",
" <td>30.01</td>\n",
" <td>3.92</td>\n",
" <td>0.420</td>\n",
" <td>0.110</td>\n",
" <td>0.470</td>\n",
" </tr>\n",
" <tr>\n",
" <th>209</th>\n",
" <td>Tuvalu</td>\n",
" <td>OCEANIA</td>\n",
" <td>11810</td>\n",
" <td>26</td>\n",
" <td>454.2</td>\n",
" <td>92.31</td>\n",
" <td>0.00</td>\n",
" <td>20.03</td>\n",
" <td>1100.0</td>\n",
" <td>NaN</td>\n",
" <td>59.3</td>\n",
" <td>0.00</td>\n",
" <td>0.00</td>\n",
" <td>100.00</td>\n",
" <td>2.000000</td>\n",
" <td>22.18</td>\n",
" <td>7.11</td>\n",
" <td>0.166</td>\n",
" <td>0.272</td>\n",
" <td>0.562</td>\n",
" </tr>\n",
" <tr>\n",
" <th>220</th>\n",
" <td>Virgin Islands</td>\n",
" <td>LATIN AMER. &amp; CARIB</td>\n",
" <td>108605</td>\n",
" <td>1910</td>\n",
" <td>56.9</td>\n",
" <td>9.84</td>\n",
" <td>-8.94</td>\n",
" <td>8.03</td>\n",
" <td>17200.0</td>\n",
" <td>NaN</td>\n",
" <td>652.8</td>\n",
" <td>11.76</td>\n",
" <td>2.94</td>\n",
" <td>85.30</td>\n",
" <td>2.000000</td>\n",
" <td>13.96</td>\n",
" <td>6.43</td>\n",
" <td>0.010</td>\n",
" <td>0.190</td>\n",
" <td>0.800</td>\n",
" </tr>\n",
" <tr>\n",
" <th>222</th>\n",
" <td>West Bank</td>\n",
" <td>NEAR EAST</td>\n",
" <td>2460492</td>\n",
" <td>5860</td>\n",
" <td>419.9</td>\n",
" <td>0.00</td>\n",
" <td>2.98</td>\n",
" <td>19.62</td>\n",
" <td>800.0</td>\n",
" <td>NaN</td>\n",
" <td>145.2</td>\n",
" <td>16.90</td>\n",
" <td>18.97</td>\n",
" <td>64.13</td>\n",
" <td>3.000000</td>\n",
" <td>31.67</td>\n",
" <td>3.92</td>\n",
" <td>0.090</td>\n",
" <td>0.280</td>\n",
" <td>0.630</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Country Region Population \\\n",
"25 Bosnia & Herzegovina EASTERN EUROPE 4498976 \n",
"66 Faroe Islands WESTERN EUROPE 47246 \n",
"74 Gaza Strip NEAR EAST 1428757 \n",
"85 Guernsey WESTERN EUROPE 65409 \n",
"99 Isle of Man WESTERN EUROPE 75441 \n",
"104 Jersey WESTERN EUROPE 91084 \n",
"108 Kiribati OCEANIA 105432 \n",
"123 Macedonia EASTERN EUROPE 2050554 \n",
"185 Slovakia EASTERN EUROPE 5439448 \n",
"187 Solomon Islands OCEANIA 552438 \n",
"209 Tuvalu OCEANIA 11810 \n",
"220 Virgin Islands LATIN AMER. & CARIB 108605 \n",
"222 West Bank NEAR EAST 2460492 \n",
"\n",
" Area (sq. mi.) Pop. Density (per sq. mi.) Coastline (coast/area ratio) \\\n",
"25 51129 88.0 0.04 \n",
"66 1399 33.8 79.84 \n",
"74 360 3968.8 11.11 \n",
"85 78 838.6 64.10 \n",
"99 572 131.9 27.97 \n",
"104 116 785.2 60.34 \n",
"108 811 130.0 140.94 \n",
"123 25333 80.9 0.00 \n",
"185 48845 111.4 0.00 \n",
"187 28450 19.4 18.67 \n",
"209 26 454.2 92.31 \n",
"220 1910 56.9 9.84 \n",
"222 5860 419.9 0.00 \n",
"\n",
" Net migration Infant mortality (per 1000 births) GDP ($ per capita) \\\n",
"25 0.31 21.05 6100.0 \n",
"66 1.41 6.24 22000.0 \n",
"74 1.60 22.93 600.0 \n",
"85 3.84 4.71 20000.0 \n",
"99 5.36 5.93 21000.0 \n",
"104 2.76 5.24 24800.0 \n",
"108 0.00 48.52 800.0 \n",
"123 -1.45 10.09 6700.0 \n",
"185 0.30 7.41 13300.0 \n",
"187 0.00 21.29 1700.0 \n",
"209 0.00 20.03 1100.0 \n",
"220 -8.94 8.03 17200.0 \n",
"222 2.98 19.62 800.0 \n",
"\n",
" Literacy (%) Phones (per 1000) Arable (%) Crops (%) Other (%) \\\n",
"25 NaN 215.4 13.60 2.96 83.44 \n",
"66 NaN 503.8 2.14 0.00 97.86 \n",
"74 NaN 244.3 28.95 21.05 50.00 \n",
"85 NaN 842.4 NaN NaN NaN \n",
"99 NaN 676.0 9.00 0.00 91.00 \n",
"104 NaN 811.3 0.00 0.00 100.00 \n",
"108 NaN 42.7 2.74 50.68 46.58 \n",
"123 NaN 260.0 22.26 1.81 75.93 \n",
"185 NaN 220.1 30.16 2.62 67.22 \n",
"187 NaN 13.4 0.64 2.00 97.36 \n",
"209 NaN 59.3 0.00 0.00 100.00 \n",
"220 NaN 652.8 11.76 2.94 85.30 \n",
"222 NaN 145.2 16.90 18.97 64.13 \n",
"\n",
" Climate Birthrate Deathrate Agriculture Industry Service \n",
"25 4.000000 8.77 8.27 0.142 0.308 0.550 \n",
"66 2.826087 14.05 8.70 0.270 0.110 0.620 \n",
"74 3.000000 39.45 3.80 0.030 0.283 0.687 \n",
"85 3.000000 8.81 10.01 0.030 0.100 0.870 \n",
"99 3.000000 11.05 11.19 0.010 0.130 0.860 \n",
"104 3.000000 9.30 9.28 0.050 0.020 0.930 \n",
"108 2.000000 30.65 8.26 0.089 0.242 0.668 \n",
"123 3.000000 12.02 8.77 0.118 0.319 0.563 \n",
"185 3.000000 10.65 9.45 0.035 0.294 0.672 \n",
"187 2.000000 30.01 3.92 0.420 0.110 0.470 \n",
"209 2.000000 22.18 7.11 0.166 0.272 0.562 \n",
"220 2.000000 13.96 6.43 0.010 0.190 0.800 \n",
"222 3.000000 31.67 3.92 0.090 0.280 0.630 "
]
},
"execution_count": 732,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[df['Literacy (%)'].isnull()]"
]
},
{
"cell_type": "code",
"execution_count": 733,
"metadata": {},
"outputs": [],
"source": [
"# https://stackoverflow.com/questions/19966018/pandas-filling-missing-values-by-mean-in-each-group\n",
"df['Literacy (%)'] = df['Literacy (%)'].fillna(df.groupby('Region')['Literacy (%)'].transform('mean'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**TASK: Check again on the remaining missing values:**"
]
},
{
"cell_type": "code",
"execution_count": 734,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Country 0\n",
"Region 0\n",
"Population 0\n",
"Area (sq. mi.) 0\n",
"Pop. Density (per sq. mi.) 0\n",
"Coastline (coast/area ratio) 0\n",
"Net migration 1\n",
"Infant mortality (per 1000 births) 1\n",
"GDP ($ per capita) 0\n",
"Literacy (%) 0\n",
"Phones (per 1000) 2\n",
"Arable (%) 1\n",
"Crops (%) 1\n",
"Other (%) 1\n",
"Climate 0\n",
"Birthrate 1\n",
"Deathrate 2\n",
"Agriculture 0\n",
"Industry 1\n",
"Service 1\n",
"dtype: int64"
]
},
"execution_count": 734,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.isnull().sum()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**TASK: Optional: We are now missing values for only a few countries. Go ahead and drop these countries OR feel free to fill in these last few remaining values with any preferred methodology. For simplicity, we will drop these.**"
]
},
{
"cell_type": "code",
"execution_count": 735,
"metadata": {},
"outputs": [],
"source": [
"# CODE HERE"
]
},
{
"cell_type": "code",
"execution_count": 736,
"metadata": {},
"outputs": [],
"source": [
"df = df.dropna()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data Feature Preparation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**TASK: It is now time to prepare the data for clustering. The Country column is still a unique identifier string, so it won't be useful for clustering, since its unique for each point. Go ahead and drop this Country column.**"
]
},
{
"cell_type": "code",
"execution_count": 737,
"metadata": {},
"outputs": [],
"source": [
"#CODE HERE"
]
},
{
"cell_type": "code",
"execution_count": 738,
"metadata": {},
"outputs": [],
"source": [
"X = df.drop(\"Country\",axis=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**TASK: Now let's create the X array of features, the Region column is still categorical strings, use Pandas to create dummy variables from this column to create a finalzed X matrix of continuous features along with the dummy variables for the Regions.**"
]
},
{
"cell_type": "code",
"execution_count": 739,
"metadata": {},
"outputs": [],
"source": [
"#COde here"
]
},
{
"cell_type": "code",
"execution_count": 740,
"metadata": {},
"outputs": [],
"source": [
"X = pd.get_dummies(X)"
]
},
{
"cell_type": "code",
"execution_count": 741,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Population</th>\n",
" <th>Area (sq. mi.)</th>\n",
" <th>Pop. Density (per sq. mi.)</th>\n",
" <th>Coastline (coast/area ratio)</th>\n",
" <th>Net migration</th>\n",
" <th>Infant mortality (per 1000 births)</th>\n",
" <th>GDP ($ per capita)</th>\n",
" <th>Literacy (%)</th>\n",
" <th>Phones (per 1000)</th>\n",
" <th>Arable (%)</th>\n",
" <th>...</th>\n",
" <th>Region_BALTICS</th>\n",
" <th>Region_C.W. OF IND. STATES</th>\n",
" <th>Region_EASTERN EUROPE</th>\n",
" <th>Region_LATIN AMER. &amp; CARIB</th>\n",
" <th>Region_NEAR EAST</th>\n",
" <th>Region_NORTHERN AFRICA</th>\n",
" <th>Region_NORTHERN AMERICA</th>\n",
" <th>Region_OCEANIA</th>\n",
" <th>Region_SUB-SAHARAN AFRICA</th>\n",
" <th>Region_WESTERN EUROPE</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>31056997</td>\n",
" <td>647500</td>\n",
" <td>48.0</td>\n",
" <td>0.00</td>\n",
" <td>23.06</td>\n",
" <td>163.07</td>\n",
" <td>700.0</td>\n",
" <td>36.0</td>\n",
" <td>3.2</td>\n",
" <td>12.13</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>3581655</td>\n",
" <td>28748</td>\n",
" <td>124.6</td>\n",
" <td>1.26</td>\n",
" <td>-4.93</td>\n",
" <td>21.52</td>\n",
" <td>4500.0</td>\n",
" <td>86.5</td>\n",
" <td>71.2</td>\n",
" <td>21.09</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>32930091</td>\n",
" <td>2381740</td>\n",
" <td>13.8</td>\n",
" <td>0.04</td>\n",
" <td>-0.39</td>\n",
" <td>31.00</td>\n",
" <td>6000.0</td>\n",
" <td>70.0</td>\n",
" <td>78.1</td>\n",
" <td>3.22</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>57794</td>\n",
" <td>199</td>\n",
" <td>290.4</td>\n",
" <td>58.29</td>\n",
" <td>-20.71</td>\n",
" <td>9.27</td>\n",
" <td>8000.0</td>\n",
" <td>97.0</td>\n",
" <td>259.5</td>\n",
" <td>10.00</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>71201</td>\n",
" <td>468</td>\n",
" <td>152.1</td>\n",
" <td>0.00</td>\n",
" <td>6.60</td>\n",
" <td>4.05</td>\n",
" <td>19000.0</td>\n",
" <td>100.0</td>\n",
" <td>497.2</td>\n",
" <td>2.22</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 29 columns</p>\n",
"</div>"
],
"text/plain": [
" Population Area (sq. mi.) Pop. Density (per sq. mi.) \\\n",
"0 31056997 647500 48.0 \n",
"1 3581655 28748 124.6 \n",
"2 32930091 2381740 13.8 \n",
"3 57794 199 290.4 \n",
"4 71201 468 152.1 \n",
"\n",
" Coastline (coast/area ratio) Net migration \\\n",
"0 0.00 23.06 \n",
"1 1.26 -4.93 \n",
"2 0.04 -0.39 \n",
"3 58.29 -20.71 \n",
"4 0.00 6.60 \n",
"\n",
" Infant mortality (per 1000 births) GDP ($ per capita) Literacy (%) \\\n",
"0 163.07 700.0 36.0 \n",
"1 21.52 4500.0 86.5 \n",
"2 31.00 6000.0 70.0 \n",
"3 9.27 8000.0 97.0 \n",
"4 4.05 19000.0 100.0 \n",
"\n",
" Phones (per 1000) Arable (%) ... \\\n",
"0 3.2 12.13 ... \n",
"1 71.2 21.09 ... \n",
"2 78.1 3.22 ... \n",
"3 259.5 10.00 ... \n",
"4 497.2 2.22 ... \n",
"\n",
" Region_BALTICS Region_C.W. OF IND. STATES \\\n",
"0 0 0 \n",
"1 0 0 \n",
"2 0 0 \n",
"3 0 0 \n",
"4 0 0 \n",
"\n",
" Region_EASTERN EUROPE Region_LATIN AMER. & CARIB \\\n",
"0 0 0 \n",
"1 1 0 \n",
"2 0 0 \n",
"3 0 0 \n",
"4 0 0 \n",
"\n",
" Region_NEAR EAST \\\n",
"0 0 \n",
"1 0 \n",
"2 0 \n",
"3 0 \n",
"4 0 \n",
"\n",
" Region_NORTHERN AFRICA \\\n",
"0 0 \n",
"1 0 \n",
"2 1 \n",
"3 0 \n",
"4 0 \n",
"\n",
" Region_NORTHERN AMERICA \\\n",
"0 0 \n",
"1 0 \n",
"2 0 \n",
"3 0 \n",
"4 0 \n",
"\n",
" Region_OCEANIA \\\n",
"0 0 \n",
"1 0 \n",
"2 0 \n",
"3 1 \n",
"4 0 \n",
"\n",
" Region_SUB-SAHARAN AFRICA \\\n",
"0 0 \n",
"1 0 \n",
"2 0 \n",
"3 0 \n",
"4 0 \n",
"\n",
" Region_WESTERN EUROPE \n",
"0 0 \n",
"1 0 \n",
"2 0 \n",
"3 0 \n",
"4 1 \n",
"\n",
"[5 rows x 29 columns]"
]
},
"execution_count": 741,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Scaling"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**TASK: Due to some measurements being in terms of percentages and other metrics being total counts (population), we should scale this data first. Use Sklearn to scale the X feature matrics.**"
]
},
{
"cell_type": "code",
"execution_count": 742,
"metadata": {},
"outputs": [],
"source": [
"#CODE HERE"
]
},
{
"cell_type": "code",
"execution_count": 743,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.preprocessing import StandardScaler"
]
},
{
"cell_type": "code",
"execution_count": 744,
"metadata": {},
"outputs": [],
"source": [
"scaler = StandardScaler()\n",
"scaled_X = scaler.fit_transform(X)"
]
},
{
"cell_type": "code",
"execution_count": 745,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 0.0133285 , 0.01855412, -0.20308668, ..., -0.31544015,\n",
" -0.54772256, -0.36514837],\n",
" [-0.21730118, -0.32370888, -0.14378531, ..., -0.31544015,\n",
" -0.54772256, -0.36514837],\n",
" [ 0.02905136, 0.97784988, -0.22956327, ..., -0.31544015,\n",
" -0.54772256, -0.36514837],\n",
" ...,\n",
" [-0.06726127, -0.04756396, -0.20881553, ..., -0.31544015,\n",
" -0.54772256, -0.36514837],\n",
" [-0.15081724, 0.07669798, -0.22840201, ..., -0.31544015,\n",
" 1.82574186, -0.36514837],\n",
" [-0.14464933, -0.12356132, -0.2160153 , ..., -0.31544015,\n",
" 1.82574186, -0.36514837]])"
]
},
"execution_count": 745,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"scaled_X"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Creating and Fitting Kmeans Model\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**TASK: Use a for loop to create and fit multiple KMeans models, testing from K=2-30 clusters. Keep track of the Sum of Squared Distances for each K value, then plot this out to create an \"elbow\" plot of K versus SSD. Optional: You may also want to create a bar plot showing the SSD difference from the previous cluster.**"
]
},
{
"cell_type": "code",
"execution_count": 746,
"metadata": {},
"outputs": [],
"source": [
"#CODE HERE"
]
},
{
"cell_type": "code",
"execution_count": 747,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.cluster import KMeans"
]
},
{
"cell_type": "code",
"execution_count": 748,
"metadata": {},
"outputs": [],
"source": [
"ssd = []\n",
"\n",
"for k in range(2,30):\n",
" \n",
" model = KMeans(n_clusters=k)\n",
" \n",
" \n",
" model.fit(scaled_X)\n",
" \n",
" #Sum of squared distances of samples to their closest cluster center.\n",
" ssd.append(model.inertia_)"
]
},
{
"cell_type": "code",
"execution_count": 749,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Text(0, 0.5, ' Sum of Squared Distances')"
]
},
"execution_count": 749,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAY0AAAEGCAYAAACZ0MnKAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAoW0lEQVR4nO3deXzU1b3/8dcnIZCwhk0LAQSEolgENG5F61ZFrQvS60IX0dKq96etve21Yu2t1tYWL22t2mqvrbWoV611QVyulIpLVVADYREQBWQx7EtYQ8jy+f0x34EQMsk3kMl3ZvJ+Ph7zyMyZ78x8vo7kk3PO93yOuTsiIiJhZEUdgIiIpA8lDRERCU1JQ0REQlPSEBGR0JQ0REQktFZRB5AM3bp18759+0YdhohIWpk1a9ZGd+9e3zEZmTT69u1LUVFR1GGIiKQVM1vR0DEanhIRkdCUNEREJDQlDRERCU1JQ0REQlPSEBGR0DLy6qmDNbm4hIlTF7O6tIye+XncPHIQo4YXRB2WiEjKUNIITC4u4dbn5lNWUQVASWkZtz43H0CJQ0QkoOGpwMSpi/cmjLiyiiomTl0cUUQiIqlHSSOwurSsUe0iIi2RkkagZ35eo9pFRFoiJY3AzSMHkZeTvV9bXk42N48cFFFEIiKpRxPhgfhk98SpiykpLSPL4GcXH6NJcBGRGtTTqGHU8ALeGX8Wf7/+FKoddu2pjDokEZGUop5GHU7o24V7rhjKOYM/F3UoIiIpRUkjgUuH94o6BBGRlKPhqXq8+fEGrn7kfSqqqqMORUQkJShp1KOispo3Fm9gypzVUYciIpISlDTqcfbRh3F0j4784Y0lVFV71OGIiEROSaMeZsaNZw5g2Yad/N+Ha6IOR0QkckoaDTjvC5/jyO7t+P30JVSrtyEiLZyunmpAdpbx4wuOZtvuCpQyRKSlU9II4eyjD486BBGRlKDhqZB2V1TxxzeXMmPppqhDERGJjHoaIZnBpHeX07tLW0458pSowxERiYR6GiG1aZXNtV/qz/ufbua9ZeptiEjLpKTRCFee0Idu7Vvz+9eXRB2KiEgklDQaIa91Nt8+rT//+mQjc1eVRh2OiEiz05xGI33j5COYvWIL2VkWdSgiIs1OSaOR2rdpxUNXFUYdhohIJJI6PGVmy81svpnNMbOioK2LmU0zs0+Cn52DdjOz+8xsiZnNM7PjarzP2OD4T8xsbDJjDuuv737KsDv/Qb/xLzNiwnQmF5dEHZKISNI1x5zGme4+zN3jf56PB15z94HAa8FjgPOBgcHtWuBBiCUZ4HbgJOBE4PZ4oonK5OISfvHSIkp3xVaJl5SWcetz85U4RCTjRTERfgkwKbg/CRhVo/1Rj5kJ5JtZD2AkMM3dN7v7FmAacF4zx7yfiVMXU1mrDlVZRRUTpy6OKCIRkeaR7KThwD/MbJaZXRu0He7u8ZKxa4F4jY4CYFWN134WtCVq34+ZXWtmRWZWtGHDhqY8hwOsLi1rVLuISKZIdtI41d2PIzb0dIOZfanmk+7u0DR1AN39IXcvdPfC7t27N8VbJtQzP69R7SIimSKpScPdS4Kf64Hnic1JrAuGnQh+rg8OLwF613h5r6AtUXtkbh45iLyc7P3acnOyuHnkoIgiEhFpHklLGmbWzsw6xO8D5wIfAlOA+BVQY4EXgvtTgKuCq6hOBrYGw1hTgXPNrHMwAX5u0BaZUcML+NXoIRTk52FAQX4eE0Yfy4XH9ogyLBGRpEvmOo3DgefNLP45T7j7q2b2AfC0mY0DVgCXB8e/AlwALAF2AdcAuPtmM/s58EFw3J3uvjmJcYcyangBo4bvm1rZsL2cUQ+8w7VfOpKLh/aMMDIRkeRJWtJw92XA0DraNwFn19HuwA0J3usvwF+aOsamlN82h9bZWdz23HyG986nd5e2UYckItLkVHuqieRkZ3HvlcMB+N5TxVRUVUcckYhI01PSaEK9u7Tll6OHULyylN/98+OowxERaXJKGk3soqE9uaKwN68tWk95ZVXU4YiINKkG5zTM7EjgM3cvN7MzgGOJrdwuTW5o6euOi4/BLLZxk4hIJgnT03gWqDKzAcBDxNZMPJHUqNJcXutscnOy2VFeycNvf0psjl9EJP2FSRrV7l4JXArc7+43A1qQEMIr89bw85cWMund5VGHIiLSJMJcclthZmOILcS7KGjLSV5ImeOywl5MXbCWn7+0kAfeWMqG7eX0zM/j5pGD9lvjISKSLsL0NK4BTgHucvdPzawf8Fhyw8oMZsaZRx1GlcP67eUqoy4iaa/BpOHuC4FbgNnB40/d/e5kB5YpHnxj6QFtKqMuIumqwaRhZhcBc4BXg8fDzGxKkuPKGCqjLiKZJMzw1B3EqtOWArj7HKB/0iLKMCqjLiKZJEzSqHD3rbXaVCMjpLrKqJvBf3x5YEQRiYgcvDBJY4GZfQ3INrOBZnY/8G6S48oYtcuod2mXgzt8vH5H1KGJiDRamEtuvwvcBpQTW9Q3FfhFMoPKNLXLqP/X5A956K1lnHJkV84cdFiEkYmINE6Yq6d2uftt7n5CcPuJu+9ujuAy1W1fOZqjPteBHz49l3Xb9J9SRNJHmKunpplZfo3Hnc0s0p3z0l1uTja//9pwOua2Yu1WJQ0RSR9hhqe61SxO6O5bzExjKodowGEdeO2HZ5CdZVGHIiISWqjaU2bWJ/7AzI4AVIGvCWRnGZVV1fx66mLe/zTyHWxFRBoUJmncBrxtZo+Z2ePAW8CtyQ2r5dhdWc2L81Zz01PFlO7aE3U4IiL1CjMR/ipwHPA34CngeHfXnEYTad+mFfePGc7GHeXc/Mw8lVEXkZQWdue+NsBmYBsw2My+lLyQWp5je+Vzy3lHMW3hOh6dsSLqcEREEgqzc9/dwBXAAvatBHdiw1TSRMad2o8ZSzfxy5cX8sc3l7J2626VUReRlBPm6qlRwCB3L09yLC1avIz620s2sia4DDdeRh1Q4hCRlBBmeGoZ2nSpWTz4xlLKK/cv66Uy6iKSSsL0NHYBc8zsNWKlRABw9+8lLaoWSmXURSTVhUkaU4KbJFnP/DxK6kgQKqMuIqmiwaTh7pOaIxCJlVG/9bn5lFVU7df+1eM1nyEiqSFM7amBZvaMmS00s2XxW3ME19LULqPeo1Mu3dq35tEZK9i6qyLq8EREQg1PPQLcDtwDnAlcQ/j1HdJItcuor9y0i+JVW+jUVtciiEj0wvzyz3P31wBz9xXufgfwleSGJXF9urblkmGxJDJ3VSm79lRGHJGItGRhkka5mWUBn5jZjWZ2KdA+yXFJLRt3lHPlQzP53pPFVFZpt10RiUaYpHET0Bb4HnA88A3gqmQGJQfq1r4NP77gKP65aD13vLhANapEJBJh5jT6uvsHwA5i8xmY2WXAe8kMTA70zVP68llpGf/z5jIK8tvy72ccGXVIItLChOlp1FUGXaXRI3LLyKO4aGhP7n71I2Yu2xR1OCLSwiTsaZjZ+cAFQIGZ3VfjqY6AZmMjkpVl/PqyYzmhb2dO7NuFycUlTJy6mNWlZSpwKCJJV9/w1GqgCLgYmFWjfTvwH8kMSurXplU2V53Sl8nFJdzy7Ly99apU4FBEki1h0nD3ucBcM3vC3SsAzKwz0NvdtzRXgJLYxKkfJSxwqKQhIskQZk5jmpl1NLMuwGzgT2Z2T5LjkhBWl+5O0K4ChyKSHGGSRid33waMBh5195OAs8N+gJllm1mxmb0UPO5nZu+Z2RIz+5uZtQ7a2wSPlwTP963xHrcG7YvNbGSjzjCDJSpkqAKHIpIsYZJGKzPrAVwOvHQQn3ETsKjG47uBe9x9ALAFGBe0jwO2BO33BMdhZoOBK4FjgPOAB8ws+yDiyDg3jxxEXs7+/ymyDP7znM9HFJGIZLowSeNOYCqwxN0/MLP+wCdh3tzMehErOfLn4LEBZwHPBIdMIrYzIMAlwWOC588Ojr8EeMrdy939U2AJcGKYz890tQsc9szP5RejvsClx/eidNcedpTrIjcRaVphSqP/Hfh7jcfLgK+GfP/fAT8COgSPuwKl7h7/bfYZEJ+xLQBWBZ9
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.plot(range(2,30),ssd,'o--')\n",
"plt.xlabel(\"K Value\")\n",
"plt.ylabel(\" Sum of Squared Distances\")"
]
},
{
"cell_type": "code",
"execution_count": 750,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:>"
]
},
"execution_count": 750,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAD/CAYAAAAZg9YLAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAATi0lEQVR4nO3dfbBcdX3H8feXRKiAQniQh0Ag1SADohSvgVo70IoQxDE+zqB/8NCZpk55cKwzAsUWKiKpWh2tQBsltFAxUh0k1UhIqvZRIAEhIQT0lgdJijaKhTp0UPDbP84vZWc5e3Pv3b1P+b1fM2fuOb/z3XN+u3f3s2d/u3s2MhNJUl12meoOSJImn+EvSRUy/CWpQoa/JFXI8JekChn+klShKQv/iFgUEQ9GxHBEXDRV/ZCkGsVUfM4/ImYB3wfeBGwB1gHvycz7J70zklShqTryXwgMZ+ZDmfkLYAWweIr6IknVmarwnws81rG8pbRJkibB7KnuQC8RsQRYArDHHnu89sgjjxzXdjZufbK1/Zi5e1lvvfU7Uf106st0qb/rrrt+kpn7t9VN1Zj/bwKXZeapZfligMy8sq1+aGgo169fP659HX7RN1rbH1l6uvXWW78T1U+nvkyX+oi4KzOH2uqmathnHbAgIuZHxK7AGcDKKeqLJFVnSoZ9MvPZiDgPWA3MApZn5qap6Isk1WjKxvwzcxWwaqr2L0k18xu+klQhw1+SKmT4S1KFDH9JqpDhL0kVMvwlqUKGvyRVaNqe20eSZrJep2WYLjzyl6QKGf6SVCHDX5Iq5Ji/pEkzncbBp1NfpoJH/pJUIcNfkipk+EtShRzzlzRutY+bz2SGvyRNA5P9RGr4S5q2fGXRW7+3jeEvSaOwsz0R+YavJFXII39pBtnZjj4HydtmbDzyl6QKeeQvjaC2o8narm/NPPKXpAoZ/pJUIcNfkipk+EtShQx/SaqQ4S9JFTL8JalChr8kVcjwl6QKGf6SVKG+wj8i3h0RmyLiVxEx1LXu4ogYjogHI+LUjvZFpW04Ii7qZ/+SpPHp99w+9wHvAP66szEijgLOAI4GDgbWRsQRZfVVwJuALcC6iFiZmff32Q/tJGo7t0xt11fTR1/hn5mbASKie9ViYEVmPgM8HBHDwMKybjgzHyqXW1FqDX9JmkQTNeY/F3isY3lLaevVLkmaRDs88o+ItcCBLasuycxbBt+l/9/vEmAJwLx58yZqN5JUpR2Gf2aePI7tbgUO7Vg+pLQxQnv3fpcBywCGhoZyHH2QJPUwUcM+K4EzImK3iJgPLADuBNYBCyJifkTsSvOm8MoJ6oMkqYe+3vCNiLcDfwnsD3wjIu7JzFMzc1NE3ETzRu6zwLmZ+Vy5zHnAamAWsDwzN/V1DSRJY9bvp31uBm7use4K4IqW9lXAqn72K0nqj9/wlaQKGf6SVCHDX5IqZPhLUoX6PbePpGnMcwepF8NfGiDDVjOFwz6SVCHDX5IqZPhLUoUMf0mqkOEvSRUy/CWpQoa/JFXI8JekChn+klQhw1+SKmT4S1KFDH9JqpDhL0kVMvwlqUKGvyRVyPCXpAoZ/pJUIcNfkipk+EtShQx/SaqQ4S9JFTL8JalChr8kVcjwl6QKGf6SVCHDX5IqNHuqOyBNpkeWnj7VXZCmhb6O/CPiExHxQERsiIibI2LvjnUXR8RwRDwYEad2tC8qbcMRcVE/+5ckjU+/wz5rgFdl5quB7wMXA0TEUcAZwNHAIuDqiJgVEbOAq4DTgKOA95RaSdIk6iv8M/O2zHy2LN4OHFLmFwMrMvOZzHwYGAYWlmk4Mx/KzF8AK0qtJGkSDXLM//eAL5f5uTRPBtttKW0Aj3W1Hz/APux0HKOWNBF2GP4RsRY4sGXVJZl5S6m5BHgW+OKgOhYRS4AlAPPmzRvUZiVJjCL8M/PkkdZHxNnAW4A3ZmaW5q3AoR1lh5Q2Rmjv3u8yYBnA0NBQttVIksan30/7LAI+BLw1M5/uWLUSOCMidouI+cAC4E5gHbAgIuZHxK40bwqv7KcPkqSx63fM/3PAbsCaiAC4PTPfl5mbIuIm4H6a4aBzM/M5gIg4D1gNzAKWZ+amPvsgSRqjvsI/M18xwrorgCta2lcBq/rZrySpP57eQZIqZPhLUoUMf0mqkOEvSRXyrJ6a0fwGtDQ+HvlLUoUMf0mqkOEvSRUy/CWpQoa/JFXI8JekChn+klQhw1+SKuSXvDQmfqlK2jl45C9JFTL8JalChr8kVcjwl6QKGf6SVCHDX5IqZPhLUoUMf0mqkOEvSRUy/CWpQoa/JFXIc/tMsok+N47n3pE0Gh75S1KFDH9JqpDhL0kVMvwlqUK+4Vs53yCW6uSRvyRVyPCXpAr1Ff4RcXlEbIiIeyLitog4uLRHRHw2IobL+uM6LnNWRPygTGf1ewUkSWPX75H/JzLz1Zl5LPB14E9L+2nAgjItAa4BiIh9gEuB44GFwKURMafPPkiSxqivN3wz86mOxT2ALPOLgeszM4HbI2LviDgIOAlYk5lPAETEGmAR8KV++qHpyzeUpemp70/7RMQVwJnAk8DvlOa5wGMdZVtKW692SdIk2uGwT0SsjYj7WqbFAJl5SWYeCnwROG9QHYuIJRGxPiLWb9u2bVCblSQxiiP/zDx5lNv6IrCKZkx/K3Box7pDSttWmqGfzvbv9NjvMmAZwNDQULbVSJLGp99P+yzoWFwMPFDmVwJnlk/9nAA8mZmPA6uBUyJiTnmj95TSJkmaRP2O+S+NiFcCvwIeBd5X2lcBbwaGgaeBcwAy84mIuBxYV+o+sv3NX0nS5On30z7v7NGewLk91i0HlvezX0lSf/yGryRVyPCXpAoZ/pJUIcNfkipk+EtShfwxlz557hpJM5Hh38Uwl1QDh30kqUKGvyRVyPCXpAoZ/pJUIcNfkipk+EtShQx/SaqQ4S9JFTL8JalChr8kVcjwl6QKGf6SVCHDX5IqZPhLUoUMf0mqkOEvSRUy/CWpQoa/JFXI8JekChn+klQhw1+SKmT4S1KFDH9JqpDhL0kVMvwlqUKGvyRVaCDhHxEfjIiMiP3KckTEZyNiOCI2RMRxHbVnRcQPynTWIPYvSRqb2f1uICIOBU4BftjRfBqwoEzHA9cAx0fEPsClwBCQwF0RsTIzf9ZvPyRJozeII/9PAx+iCfPtFgPXZ+N2YO+IOAg4FViTmU+UwF8DLBpAHyRJY9BX+EfEYmBrZt7btWou8FjH8pbS1qtdkjSJdjjsExFrgQNbVl0C/DHNkM/ARcQSYAnAvHnzJmIXklStHYZ/Zp7c1h4RxwDzgXsjAuAQ4O6IWAhsBQ7tKD+ktG0FTupq/06P/S4DlgEMDQ1lW40kaXzGPeyTmRsz82WZeXhmHk4zhHNcZv4IWAmcWT71cwLwZGY+DqwGTomIORExh+ZVw+r+r4YkaSz6/rRPD6uANwPDwNPAOQCZ+UREXA6sK3UfycwnJqgPkqQeBhb+5eh/+3wC5/aoWw4sH9R+JUlj5zd8JalChr8kVcjwl6QKGf6SVCHDX5IqZPhLUoUMf0mqkOEvSRUy/CWpQoa/JFXI8JekChn+klQhw1+SKmT4S1KFDH9JqpDhL0kVMvwlqUKGvyRVyPCXpAoZ/pJUIcNfkipk+EtShQx/SaqQ4S9JFTL8JalChr8kVcjwl6QKGf6SVCHDX5IqZPhLUoUMf0mqkOEvSRUy/CWpQn2Ff0RcFhFbI+KeMr25Y93FETEcEQ9GxKkd7YtK23BEXNTP/iVJ4zN7ANv4dGZ+srMhIo4CzgCOBg4G1kbEEWX1VcCbgC3AuohYmZn3D6AfrR5ZevpEbVqSZqxBhH+bxcCKzHwGeDgihoGFZd1wZj4EEBErSu2Ehb8k6YUGMeZ/XkRsiIjlETGntM0FHuuo2VLaerVLkibRDsM/ItZGxH0t02LgGuDlwLHA48BfDKpjEbEkItZHxPpt27YNarOSJEYx7JOZJ49mQxHxeeDrZXErcGjH6kNKGyO0d+93GbAMYGhoKEfTB0nS6PT7aZ+DOhbfDtxX5lcCZ0TEbhExH1gA3AmsAxZExPy
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"pd.Series(ssd).diff().plot(kind='bar')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"-----"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Model Interpretation\n",
"\n",
"\n",
"**TASK: What K value do you think is a good choice? Are there multiple reasonable choices? What features are helping define these cluster choices. As this is unsupervised learning, there is no 100% correct answer here. Please feel free to jump to the solutions for a full discussion on this!.**"
]
},
{
"cell_type": "code",
"execution_count": 751,
"metadata": {},
"outputs": [],
"source": [
"# Nothing to really code here, but choose a K value and see what features \n",
"# are most correlated to belonging to a particular cluster!\n",
"\n",
"# Remember, there is no 100% correct answer here!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"-----\n",
"\n",
"\n",
"#### Example Interpretation: Choosing K=3\n",
"\n",
"**One could say that there is a significant drop off in SSD difference at K=3 (although we can see it continues to drop off past this). What would an analysis look like for K=3? Let's explore which features are important in the decision of 3 clusters!**"
]
},
{
"cell_type": "code",
"execution_count": 753,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"KMeans(n_clusters=3)"
]
},
"execution_count": 753,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = KMeans(n_clusters=3)\n",
"model.fit(scaled_X)"
]
},
{
"cell_type": "code",
"execution_count": 754,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([2, 0, 0, 0, 1, 2, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 2, 1, 1, 1, 0, 2,\n",
" 1, 2, 0, 1, 2, 0, 1, 0, 1, 2, 2, 2, 2, 2, 1, 0, 1, 2, 2, 0, 0, 0,\n",
" 2, 2, 2, 0, 2, 1, 0, 1, 1, 2, 0, 0, 0, 0, 0, 2, 2, 1, 2, 1, 0, 1,\n",
" 1, 0, 0, 2, 2, 0, 0, 1, 2, 1, 1, 0, 0, 0, 0, 0, 2, 2, 0, 2, 0, 1,\n",
" 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 2, 0, 0, 1, 0, 0, 2,\n",
" 1, 0, 2, 2, 0, 1, 1, 1, 1, 1, 2, 2, 0, 0, 2, 1, 0, 0, 2, 0, 2, 0,\n",
" 0, 0, 0, 0, 0, 2, 2, 0, 2, 1, 0, 0, 1, 0, 2, 2, 0, 1, 0, 2, 0, 0,\n",
" 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 2, 0, 0, 0, 0, 0, 0, 1, 0, 0, 2,\n",
" 0, 2, 1, 1, 1, 0, 2, 2, 1, 0, 2, 0, 2, 1, 1, 0, 1, 0, 2, 0, 2, 0,\n",
" 0, 0, 0, 0, 0, 0, 2, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2,\n",
" 2])"
]
},
"execution_count": 754,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model.labels_"
]
},
{
"cell_type": "code",
"execution_count": 756,
"metadata": {},
"outputs": [],
"source": [
"X['K=3 Clusters'] = model.labels_"
]
},
{
"cell_type": "code",
"execution_count": 757,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Literacy (%) -0.419453\n",
"Region_LATIN AMER. & CARIB -0.377533\n",
"Region_OCEANIA -0.248224\n",
"Crops (%) -0.245934\n",
"Phones (per 1000) -0.198737\n",
"Region_C.W. OF IND. STATES -0.193384\n",
"Region_NEAR EAST -0.179732\n",
"Coastline (coast/area ratio) -0.158318\n",
"Region_NORTHERN AFRICA -0.151646\n",
"Service -0.117898\n",
"Population -0.062404\n",
"GDP ($ per capita) -0.060568\n",
"Industry -0.048420\n",
"Area (sq. mi.) -0.039735\n",
"Region_NORTHERN AMERICA -0.027789\n",
"Pop. Density (per sq. mi.) 0.013816\n",
"Other (%) 0.016429\n",
"Climate 0.024573\n",
"Region_ASIA (EX. NEAR EAST) 0.028712\n",
"Region_BALTICS 0.035283\n",
"Region_EASTERN EUROPE 0.043691\n",
"Arable (%) 0.084553\n",
"Region_WESTERN EUROPE 0.109824\n",
"Net migration 0.208539\n",
"Agriculture 0.440815\n",
"Birthrate 0.494413\n",
"Infant mortality (per 1000 births) 0.614130\n",
"Region_SUB-SAHARAN AFRICA 0.670927\n",
"Deathrate 0.727801\n",
"K=3 Clusters 1.000000\n",
"Name: K=3 Clusters, dtype: float64"
]
},
"execution_count": 757,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X.corr()['K=3 Clusters'].sort_values()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"------------\n",
"-------------\n",
"\n",
"# BONUS CHALLGENGE:\n",
"## Geographical Model Interpretation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The best way to interpret this model is through visualizing the clusters of countries on a map! **NOTE: THIS IS A BONUS SECTION. YOU MAY WANT TO JUMP TO THE SOLUTIONS LECTURE FOR A FULL GUIDE, SINCE WE WILL COVER TOPICS NOT PREVIOUSLY DISCUSSED AND BE HAVING A NUANCED DISCUSSION ON PERFORMANCE!**\n",
"\n",
"----\n",
"----\n",
"\n",
"**IF YOU GET STUCK, PLEASE CHECK OUT THE SOLUTIONS LECTURE. AS THIS IS OPTIONAL AND COVERS MANY TOPICS NOT SHOWN IN ANY PREVIOUS LECTURE**\n",
"\n",
"----\n",
"----"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**TASK: Create cluster labels for a chosen K value. Based on the solutions, we believe either K=3 or K=15 are reasonable choices. But feel free to choose differently and explore.**"
]
},
{
"cell_type": "code",
"execution_count": 765,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"KMeans(n_clusters=15)"
]
},
"execution_count": 765,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = KMeans(n_clusters=15)\n",
" \n",
"model.fit(scaled_X)\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 766,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"KMeans(n_clusters=3)"
]
},
"execution_count": 766,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = KMeans(n_clusters=3)\n",
" \n",
"model.fit(scaled_X)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**TASK: Let's put you in the real world! Your boss just asked you to plot out these clusters on a country level choropleth map, can you figure out how to do this? We won't step by step guide you at all on this, just show you an example result. You'll need to do the following:**\n",
"\n",
"1. Figure out how to install plotly library: https://plotly.com/python/getting-started/\n",
"\n",
"2. Figure out how to create a geographical choropleth map using plotly: https://plotly.com/python/choropleth-maps/#using-builtin-country-and-state-geometries\n",
"\n",
"3. You will need ISO Codes for this. Either use the wikipedia page, or use our provided file for this: **\"../DATA/country_iso_codes.csv\"**\n",
"\n",
"4. Combine the cluster labels, ISO Codes, and Country Names to create a world map plot with plotly given what you learned in Step 1 and Step 2.\n",
"\n",
"\n",
"**Note: This is meant to be a more realistic project, where you have a clear objective of what you need to create and accomplish and the necessary online documentation. It's up to you to piece everything together to figure it out! If you get stuck, no worries! Check out the solution lecture.**\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 767,
"metadata": {},
"outputs": [],
"source": [
"iso_codes = pd.read_csv(\"../DATA/country_iso_codes.csv\")"
]
},
{
"cell_type": "code",
"execution_count": 768,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Country</th>\n",
" <th>ISO Code</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Afghanistan</td>\n",
" <td>AFG</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Akrotiri and Dhekelia See United Kingdom, The</td>\n",
" <td>Akrotiri and Dhekelia See United Kingdom, The</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Åland Islands</td>\n",
" <td>ALA</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Albania</td>\n",
" <td>ALB</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Algeria</td>\n",
" <td>DZA</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>296</th>\n",
" <td>Congo, Dem. Rep.</td>\n",
" <td>COD</td>\n",
" </tr>\n",
" <tr>\n",
" <th>297</th>\n",
" <td>Congo, Repub. of the</td>\n",
" <td>COG</td>\n",
" </tr>\n",
" <tr>\n",
" <th>298</th>\n",
" <td>Tanzania</td>\n",
" <td>TZA</td>\n",
" </tr>\n",
" <tr>\n",
" <th>299</th>\n",
" <td>Central African Rep.</td>\n",
" <td>CAF</td>\n",
" </tr>\n",
" <tr>\n",
" <th>300</th>\n",
" <td>Cote d'Ivoire</td>\n",
" <td>CIV</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>301 rows × 2 columns</p>\n",
"</div>"
],
"text/plain": [
" Country \\\n",
"0 Afghanistan \n",
"1 Akrotiri and Dhekelia See United Kingdom, The \n",
"2 Åland Islands \n",
"3 Albania \n",
"4 Algeria \n",
".. ... \n",
"296 Congo, Dem. Rep. \n",
"297 Congo, Repub. of the \n",
"298 Tanzania \n",
"299 Central African Rep. \n",
"300 Cote d'Ivoire \n",
"\n",
" ISO Code \n",
"0 AFG \n",
"1 Akrotiri and Dhekelia See United Kingdom, The \n",
"2 ALA \n",
"3 ALB \n",
"4 DZA \n",
".. ... \n",
"296 COD \n",
"297 COG \n",
"298 TZA \n",
"299 CAF \n",
"300 CIV \n",
"\n",
"[301 rows x 2 columns]"
]
},
"execution_count": 768,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"iso_codes"
]
},
{
"cell_type": "code",
"execution_count": 769,
"metadata": {},
"outputs": [],
"source": [
"iso_mapping = iso_codes.set_index('Country')['ISO Code'].to_dict()"
]
},
{
"cell_type": "code",
"execution_count": 770,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'Afghanistan': 'AFG',\n",
" 'Akrotiri and Dhekelia See United Kingdom, The': 'Akrotiri and Dhekelia See United Kingdom, The',\n",
" 'Åland Islands': 'ALA',\n",
" 'Albania': 'ALB',\n",
" 'Algeria': 'DZA',\n",
" 'American Samoa': 'ASM',\n",
" 'Andorra': 'AND',\n",
" 'Angola': 'AGO',\n",
" 'Anguilla': 'AIA',\n",
" 'Antarctica\\u200a[a]': 'ATA',\n",
" 'Antigua and Barbuda': 'ATG',\n",
" 'Argentina': 'ARG',\n",
" 'Armenia': 'ARM',\n",
" 'Aruba': 'ABW',\n",
" 'Ashmore and Cartier Islands See Australia.': 'Ashmore and Cartier Islands See Australia.',\n",
" 'Australia\\u200a[b]': 'AUS',\n",
" 'Austria': 'AUT',\n",
" 'Azerbaijan': 'AZE',\n",
" 'Bahamas (the)': 'BHS',\n",
" 'Bahrain': 'BHR',\n",
" 'Bangladesh': 'BGD',\n",
" 'Barbados': 'BRB',\n",
" 'Belarus': 'BLR',\n",
" 'Belgium': 'BEL',\n",
" 'Belize': 'BLZ',\n",
" 'Benin': 'BEN',\n",
" 'Bermuda': 'BMU',\n",
" 'Bhutan': 'BTN',\n",
" 'Bolivia (Plurinational State of)': 'BOL',\n",
" 'Bonaire\\xa0Sint Eustatius\\xa0Saba': 'BES',\n",
" 'Bosnia and Herzegovina': 'BIH',\n",
" 'Botswana': 'BWA',\n",
" 'Bouvet Island': 'BVT',\n",
" 'Brazil': 'BRA',\n",
" 'British Indian Ocean Territory (the)': 'IOT',\n",
" 'British Virgin Islands See Virgin Islands (British).': 'British Virgin Islands See Virgin Islands (British).',\n",
" 'Brunei Darussalam\\u200a[e]': 'BRN',\n",
" 'Bulgaria': 'BGR',\n",
" 'Burkina Faso': 'BFA',\n",
" 'Burma See Myanmar.': 'Burma See Myanmar.',\n",
" 'Burundi': 'BDI',\n",
" 'Cabo Verde\\u200a[f]': 'CPV',\n",
" 'Cambodia': 'KHM',\n",
" 'Cameroon': 'CMR',\n",
" 'Canada': 'CAN',\n",
" 'Cape Verde See Cabo Verde.': 'Cape Verde See Cabo Verde.',\n",
" 'Caribbean Netherlands See Bonaire, Sint Eustatius and Saba.': 'Caribbean Netherlands See Bonaire, Sint Eustatius and Saba.',\n",
" 'Cayman Islands (the)': 'CYM',\n",
" 'Central African Republic (the)': 'CAF',\n",
" 'Chad': 'TCD',\n",
" 'Chile': 'CHL',\n",
" 'China': 'CHN',\n",
" 'China, The Republic of See Taiwan (Province of China).': 'China, The Republic of See Taiwan (Province of China).',\n",
" 'Christmas Island': 'CXR',\n",
" 'Clipperton Island See France.': 'Clipperton Island See France.',\n",
" 'Cocos (Keeling) Islands (the)': 'CCK',\n",
" 'Colombia': 'COL',\n",
" 'Comoros (the)': 'COM',\n",
" 'Congo (the Democratic Republic of the)': 'COD',\n",
" 'Congo (the)\\u200a[g]': 'COG',\n",
" 'Cook Islands (the)': 'COK',\n",
" 'Coral Sea Islands See Australia.': 'Coral Sea Islands See Australia.',\n",
" 'Costa Rica': 'CRI',\n",
" \"Côte d'Ivoire\\u200a[h]\": 'CIV',\n",
" 'Croatia': 'HRV',\n",
" 'Cuba': 'CUB',\n",
" 'Curaçao': 'CUW',\n",
" 'Cyprus': 'CYP',\n",
" 'Czechia\\u200a[i]': 'CZE',\n",
" \"Democratic People's Republic of Korea See Korea, The Democratic People's Republic of.\": \"Democratic People's Republic of Korea See Korea, The Democratic People's Republic of.\",\n",
" 'Democratic Republic of the Congo See Congo, The Democratic Republic of the.': 'Democratic Republic of the Congo See Congo, The Democratic Republic of the.',\n",
" 'Denmark': 'DNK',\n",
" 'Djibouti': 'DJI',\n",
" 'Dominica': 'DMA',\n",
" 'Dominican Republic (the)': 'DOM',\n",
" 'East Timor See Timor-Leste.': 'East Timor See Timor-Leste.',\n",
" 'Ecuador': 'ECU',\n",
" 'Egypt': 'EGY',\n",
" 'El Salvador': 'SLV',\n",
" 'England See United Kingdom, The.': 'England See United Kingdom, The.',\n",
" 'Equatorial Guinea': 'GNQ',\n",
" 'Eritrea': 'ERI',\n",
" 'Estonia': 'EST',\n",
" 'Eswatini\\u200a[j]': 'SWZ',\n",
" 'Ethiopia': 'ETH',\n",
" 'Falkland Islands (the) [Malvinas]\\u200a[k]': 'FLK',\n",
" 'Faroe Islands (the)': 'FRO',\n",
" 'Fiji': 'FJI',\n",
" 'Finland': 'FIN',\n",
" 'France\\u200a[l]': 'FRA',\n",
" 'French Guiana': 'GUF',\n",
" 'French Polynesia': 'PYF',\n",
" 'French Southern Territories (the)\\u200a[m]': 'ATF',\n",
" 'Gabon': 'GAB',\n",
" 'Gambia (the)': 'GMB',\n",
" 'Georgia': 'GEO',\n",
" 'Germany': 'DEU',\n",
" 'Ghana': 'GHA',\n",
" 'Gibraltar': 'GIB',\n",
" 'Great Britain See United Kingdom, The.': 'Great Britain See United Kingdom, The.',\n",
" 'Greece': 'GRC',\n",
" 'Greenland': 'GRL',\n",
" 'Grenada': 'GRD',\n",
" 'Guadeloupe': 'GLP',\n",
" 'Guam': 'GUM',\n",
" 'Guatemala': 'GTM',\n",
" 'Guernsey': 'GGY',\n",
" 'Guinea': 'GIN',\n",
" 'Guinea-Bissau': 'GNB',\n",
" 'Guyana': 'GUY',\n",
" 'Haiti': 'HTI',\n",
" 'Hawaiian Islands See United States of America, The.': 'Hawaiian Islands See United States of America, The.',\n",
" 'Heard Island and McDonald Islands': 'HMD',\n",
" 'Holy See (the)\\u200a[n]': 'VAT',\n",
" 'Honduras': 'HND',\n",
" 'Hong Kong': 'HKG',\n",
" 'Hungary': 'HUN',\n",
" 'Iceland': 'ISL',\n",
" 'India': 'IND',\n",
" 'Indonesia': 'IDN',\n",
" 'Iran (Islamic Republic of)': 'IRN',\n",
" 'Iraq': 'IRQ',\n",
" 'Ireland': 'IRL',\n",
" 'Isle of Man': 'IMN',\n",
" 'Israel': 'ISR',\n",
" 'Italy': 'ITA',\n",
" \"Ivory Coast See Côte d'Ivoire.\": \"Ivory Coast See Côte d'Ivoire.\",\n",
" 'Jamaica': 'JAM',\n",
" 'Jan Mayen See Svalbard and Jan Mayen.': 'Jan Mayen See Svalbard and Jan Mayen.',\n",
" 'Japan': 'JPN',\n",
" 'Jersey': 'JEY',\n",
" 'Jordan': 'JOR',\n",
" 'Kazakhstan': 'KAZ',\n",
" 'Kenya': 'KEN',\n",
" 'Kiribati': 'KIR',\n",
" \"Korea (the Democratic People's Republic of)\\u200a[o]\": 'PRK',\n",
" 'Korea (the Republic of)\\u200a[p]': 'KOR',\n",
" 'Kuwait': 'KWT',\n",
" 'Kyrgyzstan': 'KGZ',\n",
" \"Lao People's Democratic Republic (the)\\u200a[q]\": 'LAO',\n",
" 'Latvia': 'LVA',\n",
" 'Lebanon': 'LBN',\n",
" 'Lesotho': 'LSO',\n",
" 'Liberia': 'LBR',\n",
" 'Libya': 'LBY',\n",
" 'Liechtenstein': 'LIE',\n",
" 'Lithuania': 'LTU',\n",
" 'Luxembourg': 'LUX',\n",
" 'Macao\\u200a[r]': 'MAC',\n",
" 'North Macedonia\\u200a[s]': 'MKD',\n",
" 'Madagascar': 'MDG',\n",
" 'Malawi': 'MWI',\n",
" 'Malaysia': 'MYS',\n",
" 'Maldives': 'MDV',\n",
" 'Mali': 'MLI',\n",
" 'Malta': 'MLT',\n",
" 'Marshall Islands (the)': 'MHL',\n",
" 'Martinique': 'MTQ',\n",
" 'Mauritania': 'MRT',\n",
" 'Mauritius': 'MUS',\n",
" 'Mayotte': 'MYT',\n",
" 'Mexico': 'MEX',\n",
" 'Micronesia (Federated States of)': 'FSM',\n",
" 'Moldova (the Republic of)': 'MDA',\n",
" 'Monaco': 'MCO',\n",
" 'Mongolia': 'MNG',\n",
" 'Montenegro': 'MNE',\n",
" 'Montserrat': 'MSR',\n",
" 'Morocco': 'MAR',\n",
" 'Mozambique': 'MOZ',\n",
" 'Myanmar\\u200a[t]': 'MMR',\n",
" 'Namibia': 'NAM',\n",
" 'Nauru': 'NRU',\n",
" 'Nepal': 'NPL',\n",
" 'Netherlands (the)': 'NLD',\n",
" 'New Caledonia': 'NCL',\n",
" 'New Zealand': 'NZL',\n",
" 'Nicaragua': 'NIC',\n",
" 'Niger (the)': 'NER',\n",
" 'Nigeria': 'NGA',\n",
" 'Niue': 'NIU',\n",
" 'Norfolk Island': 'NFK',\n",
" \"North Korea See Korea, The Democratic People's Republic of.\": \"North Korea See Korea, The Democratic People's Republic of.\",\n",
" 'Northern Ireland See United Kingdom, The.': 'Northern Ireland See United Kingdom, The.',\n",
" 'Northern Mariana Islands (the)': 'MNP',\n",
" 'Norway': 'NOR',\n",
" 'Oman': 'OMN',\n",
" 'Pakistan': 'PAK',\n",
" 'Palau': 'PLW',\n",
" 'Palestine, State of': 'PSE',\n",
" 'Panama': 'PAN',\n",
" 'Papua New Guinea': 'PNG',\n",
" 'Paraguay': 'PRY',\n",
" \"People's Republic of China See China.\": \"People's Republic of China See China.\",\n",
" 'Peru': 'PER',\n",
" 'Philippines (the)': 'PHL',\n",
" 'Pitcairn\\u200a[u]': 'PCN',\n",
" 'Poland': 'POL',\n",
" 'Portugal': 'PRT',\n",
" 'Puerto Rico': 'PRI',\n",
" 'Qatar': 'QAT',\n",
" 'Republic of China See Taiwan (Province of China).': 'Republic of China See Taiwan (Province of China).',\n",
" 'Republic of Korea See Korea, The Republic of.': 'Republic of Korea See Korea, The Republic of.',\n",
" 'Republic of the Congo See Congo, The.': 'Republic of the Congo See Congo, The.',\n",
" 'Réunion': 'REU',\n",
" 'Romania': 'ROU',\n",
" 'Russian Federation (the)\\u200a[v]': 'RUS',\n",
" 'Rwanda': 'RWA',\n",
" 'Saba See Bonaire, Sint Eustatius and Saba.': 'Saba See Bonaire, Sint Eustatius and Saba.',\n",
" 'Sahrawi Arab Democratic Republic See Western Sahara.': 'Sahrawi Arab Democratic Republic See Western Sahara.',\n",
" 'Saint Barthélemy': 'BLM',\n",
" 'Saint Helena\\xa0Ascension Island\\xa0Tristan da Cunha': 'SHN',\n",
" 'Saint Kitts and Nevis': 'KNA',\n",
" 'Saint Lucia': 'LCA',\n",
" 'Saint Martin (French part)': 'MAF',\n",
" 'Saint Pierre and Miquelon': 'SPM',\n",
" 'Saint Vincent and the Grenadines': 'VCT',\n",
" 'Samoa': 'WSM',\n",
" 'San Marino': 'SMR',\n",
" 'Sao Tome and Principe': 'STP',\n",
" 'Saudi Arabia': 'SAU',\n",
" 'Scotland See United Kingdom, The.': 'Scotland See United Kingdom, The.',\n",
" 'Senegal': 'SEN',\n",
" 'Serbia': 'SRB',\n",
" 'Seychelles': 'SYC',\n",
" 'Sierra Leone': 'SLE',\n",
" 'Singapore': 'SGP',\n",
" 'Sint Eustatius See Bonaire, Sint Eustatius and Saba.': 'Sint Eustatius See Bonaire, Sint Eustatius and Saba.',\n",
" 'Sint Maarten (Dutch part)': 'SXM',\n",
" 'Slovakia': 'SVK',\n",
" 'Slovenia': 'SVN',\n",
" 'Solomon Islands': 'SLB',\n",
" 'Somalia': 'SOM',\n",
" 'South Africa': 'ZAF',\n",
" 'South Georgia and the South Sandwich Islands': 'SGS',\n",
" 'South Korea See Korea, The Republic of.': 'South Korea See Korea, The Republic of.',\n",
" 'South Sudan': 'SSD',\n",
" 'Spain': 'ESP',\n",
" 'Sri Lanka': 'LKA',\n",
" 'Sudan (the)': 'SDN',\n",
" 'Suriname': 'SUR',\n",
" 'Svalbard\\xa0Jan Mayen': 'SJM',\n",
" 'Sweden': 'SWE',\n",
" 'Switzerland': 'CHE',\n",
" 'Syrian Arab Republic (the)\\u200a[x]': 'SYR',\n",
" 'Taiwan (Province of China)\\u200a[y]': 'TWN',\n",
" 'Tajikistan': 'TJK',\n",
" 'Tanzania, the United Republic of': 'TZA',\n",
" 'Thailand': 'THA',\n",
" 'Timor-Leste\\u200a[aa]': 'TLS',\n",
" 'Togo': 'TGO',\n",
" 'Tokelau': 'TKL',\n",
" 'Tonga': 'TON',\n",
" 'Trinidad and Tobago': 'TTO',\n",
" 'Tunisia': 'TUN',\n",
" 'Turkey': 'TUR',\n",
" 'Turkmenistan': 'TKM',\n",
" 'Turks and Caicos Islands (the)': 'TCA',\n",
" 'Tuvalu': 'TUV',\n",
" 'Uganda': 'UGA',\n",
" 'Ukraine': 'UKR',\n",
" 'United Arab Emirates (the)': 'ARE',\n",
" 'United Kingdom of Great Britain and Northern Ireland (the)': 'GBR',\n",
" 'United States Minor Outlying Islands (the)\\u200a[ac]': 'UMI',\n",
" 'United States of America (the)': 'USA',\n",
" 'United States Virgin Islands See Virgin Islands (U.S.).': 'United States Virgin Islands See Virgin Islands (U.S.).',\n",
" 'Uruguay': 'URY',\n",
" 'Uzbekistan': 'UZB',\n",
" 'Vanuatu': 'VUT',\n",
" 'Vatican City See Holy See, The.': 'Vatican City See Holy See, The.',\n",
" 'Venezuela (Bolivarian Republic of)': 'VEN',\n",
" 'Viet Nam\\u200a[ae]': 'VNM',\n",
" 'Virgin Islands (British)\\u200a[af]': 'VGB',\n",
" 'Virgin Islands (U.S.)\\u200a[ag]': 'VIR',\n",
" 'Wales See United Kingdom, The.': 'Wales See United Kingdom, The.',\n",
" 'Wallis and Futuna': 'WLF',\n",
" 'Western Sahara\\u200a[ah]': 'ESH',\n",
" 'Yemen': 'YEM',\n",
" 'Zambia': 'ZMB',\n",
" 'Zimbabwe': 'ZWE',\n",
" 'United States': 'USA',\n",
" 'United Kingdom': 'GBR',\n",
" 'Venezuela': 'VEN',\n",
" 'Australia': 'AUS',\n",
" 'Iran': 'IRN',\n",
" 'France': 'FRA',\n",
" 'Russia': 'RUS',\n",
" 'Korea, North': 'PRK',\n",
" 'Korea, South': 'KOR',\n",
" 'Myanmar': 'MMR',\n",
" 'Burma': 'MMR',\n",
" 'Vietnam': 'VNM',\n",
" 'Laos': 'LAO',\n",
" 'Bolivia': 'BOL',\n",
" 'Niger': 'NER',\n",
" 'Sudan': 'SDN',\n",
" 'Congo, Dem. Rep.': 'COD',\n",
" 'Congo, Repub. of the': 'COG',\n",
" 'Tanzania': 'TZA',\n",
" 'Central African Rep.': 'CAF',\n",
" \"Cote d'Ivoire\": 'CIV'}"
]
},
"execution_count": 770,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"iso_mapping"
]
},
{
"cell_type": "code",
"execution_count": 771,
"metadata": {},
"outputs": [],
"source": [
"df['ISO Code'] = df['Country'].map(iso_mapping)"
]
},
{
"cell_type": "code",
"execution_count": 772,
"metadata": {},
"outputs": [],
"source": [
"df['Cluster'] = model.labels_"
]
},
{
"cell_type": "code",
"execution_count": 773,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.plotly.v1+json": {
"config": {
"plotlyServerURL": "https://plot.ly"
},
"data": [
{
"coloraxis": "coloraxis",
"geo": "geo",
"hovertemplate": "<b>%{hovertext}</b><br><br>ISO Code=%{location}<br>Cluster=%{z}<extra></extra>",
"hovertext": [
"Afghanistan",
"Albania",
"Algeria",
"American Samoa",
"Andorra",
"Angola",
"Anguilla",
"Antigua & Barbuda",
"Argentina",
"Armenia",
"Aruba",
"Australia",
"Austria",
"Azerbaijan",
"Bahamas, The",
"Bahrain",
"Bangladesh",
"Barbados",
"Belarus",
"Belgium",
"Belize",
"Benin",
"Bermuda",
"Bhutan",
"Bolivia",
"Bosnia & Herzegovina",
"Botswana",
"Brazil",
"British Virgin Is.",
"Brunei",
"Bulgaria",
"Burkina Faso",
"Burma",
"Burundi",
"Cambodia",
"Cameroon",
"Canada",
"Cape Verde",
"Cayman Islands",
"Central African Rep.",
"Chad",
"Chile",
"China",
"Colombia",
"Comoros",
"Congo, Dem. Rep.",
"Congo, Repub. of the",
"Costa Rica",
"Cote d'Ivoire",
"Croatia",
"Cuba",
"Czech Republic",
"Denmark",
"Djibouti",
"Dominica",
"Dominican Republic",
"Ecuador",
"Egypt",
"El Salvador",
"Equatorial Guinea",
"Eritrea",
"Estonia",
"Ethiopia",
"Faroe Islands",
"Fiji",
"Finland",
"France",
"French Guiana",
"French Polynesia",
"Gabon",
"Gambia, The",
"Gaza Strip",
"Georgia",
"Germany",
"Ghana",
"Gibraltar",
"Greece",
"Greenland",
"Grenada",
"Guadeloupe",
"Guam",
"Guatemala",
"Guinea",
"Guinea-Bissau",
"Guyana",
"Haiti",
"Honduras",
"Hong Kong",
"Hungary",
"Iceland",
"India",
"Indonesia",
"Iran",
"Iraq",
"Ireland",
"Isle of Man",
"Israel",
"Italy",
"Jamaica",
"Japan",
"Jersey",
"Jordan",
"Kazakhstan",
"Kenya",
"Kiribati",
"Korea, North",
"Korea, South",
"Kuwait",
"Kyrgyzstan",
"Laos",
"Latvia",
"Lebanon",
"Lesotho",
"Liberia",
"Libya",
"Liechtenstein",
"Lithuania",
"Luxembourg",
"Macau",
"Macedonia",
"Madagascar",
"Malawi",
"Malaysia",
"Maldives",
"Mali",
"Malta",
"Marshall Islands",
"Martinique",
"Mauritania",
"Mauritius",
"Mayotte",
"Mexico",
"Micronesia, Fed. St.",
"Moldova",
"Mongolia",
"Montserrat",
"Morocco",
"Mozambique",
"Namibia",
"Nauru",
"Nepal",
"Netherlands",
"Netherlands Antilles",
"New Caledonia",
"New Zealand",
"Nicaragua",
"Niger",
"Nigeria",
"N. Mariana Islands",
"Norway",
"Oman",
"Pakistan",
"Palau",
"Panama",
"Papua New Guinea",
"Paraguay",
"Peru",
"Philippines",
"Poland",
"Portugal",
"Puerto Rico",
"Qatar",
"Reunion",
"Romania",
"Russia",
"Rwanda",
"Saint Helena",
"Saint Kitts & Nevis",
"Saint Lucia",
"St Pierre & Miquelon",
"Saint Vincent and the Grenadines",
"Samoa",
"San Marino",
"Sao Tome & Principe",
"Saudi Arabia",
"Senegal",
"Seychelles",
"Sierra Leone",
"Singapore",
"Slovakia",
"Slovenia",
"Solomon Islands",
"Somalia",
"South Africa",
"Spain",
"Sri Lanka",
"Sudan",
"Suriname",
"Swaziland",
"Sweden",
"Switzerland",
"Syria",
"Taiwan",
"Tajikistan",
"Tanzania",
"Thailand",
"Togo",
"Tonga",
"Trinidad & Tobago",
"Tunisia",
"Turkey",
"Turkmenistan",
"Turks & Caicos Is",
"Tuvalu",
"Uganda",
"Ukraine",
"United Arab Emirates",
"United Kingdom",
"United States",
"Uruguay",
"Uzbekistan",
"Vanuatu",
"Venezuela",
"Vietnam",
"Virgin Islands",
"Wallis and Futuna",
"West Bank",
"Western Sahara",
"Yemen",
"Zambia",
"Zimbabwe"
],
"locations": [
"AFG",
"ALB",
"DZA",
"ASM",
"AND",
"AGO",
"AIA",
null,
"ARG",
"ARM",
"ABW",
"AUS",
"AUT",
"AZE",
null,
"BHR",
"BGD",
"BRB",
"BLR",
"BEL",
"BLZ",
"BEN",
"BMU",
"BTN",
"BOL",
null,
"BWA",
"BRA",
null,
null,
"BGR",
"BFA",
"MMR",
"BDI",
"KHM",
"CMR",
"CAN",
null,
null,
"CAF",
"TCD",
"CHL",
"CHN",
"COL",
null,
"COD",
"COG",
"CRI",
"CIV",
"HRV",
"CUB",
null,
"DNK",
"DJI",
"DMA",
null,
"ECU",
"EGY",
"SLV",
"GNQ",
"ERI",
"EST",
"ETH",
null,
"FJI",
"FIN",
"FRA",
"GUF",
"PYF",
"GAB",
null,
null,
"GEO",
"DEU",
"GHA",
"GIB",
"GRC",
"GRL",
"GRD",
"GLP",
"GUM",
"GTM",
"GIN",
"GNB",
"GUY",
"HTI",
"HND",
"HKG",
"HUN",
"ISL",
"IND",
"IDN",
"IRN",
"IRQ",
"IRL",
"IMN",
"ISR",
"ITA",
"JAM",
"JPN",
"JEY",
"JOR",
"KAZ",
"KEN",
"KIR",
"PRK",
"KOR",
"KWT",
"KGZ",
"LAO",
"LVA",
"LBN",
"LSO",
"LBR",
"LBY",
"LIE",
"LTU",
"LUX",
null,
null,
"MDG",
"MWI",
"MYS",
"MDV",
"MLI",
"MLT",
null,
"MTQ",
"MRT",
"MUS",
"MYT",
"MEX",
null,
null,
"MNG",
"MSR",
"MAR",
"MOZ",
"NAM",
"NRU",
"NPL",
null,
null,
"NCL",
"NZL",
"NIC",
"NER",
"NGA",
null,
"NOR",
"OMN",
"PAK",
"PLW",
"PAN",
"PNG",
"PRY",
"PER",
null,
"POL",
"PRT",
"PRI",
"QAT",
null,
"ROU",
"RUS",
"RWA",
null,
null,
"LCA",
null,
"VCT",
"WSM",
"SMR",
null,
"SAU",
"SEN",
"SYC",
"SLE",
"SGP",
"SVK",
"SVN",
"SLB",
"SOM",
"ZAF",
"ESP",
"LKA",
"SDN",
"SUR",
null,
"SWE",
"CHE",
null,
null,
"TJK",
"TZA",
"THA",
"TGO",
"TON",
null,
"TUN",
"TUR",
"TKM",
null,
"TUV",
"UGA",
"UKR",
null,
"GBR",
"USA",
"URY",
"UZB",
"VUT",
"VEN",
"VNM",
null,
"WLF",
null,
null,
"YEM",
"ZMB",
"ZWE"
],
"name": "",
"type": "choropleth",
"z": [
2,
0,
0,
0,
1,
2,
0,
0,
0,
0,
1,
1,
1,
0,
0,
0,
0,
1,
1,
1,
0,
2,
1,
2,
0,
1,
2,
0,
1,
0,
1,
2,
2,
2,
2,
2,
1,
0,
1,
2,
2,
0,
0,
0,
2,
2,
2,
0,
2,
1,
0,
1,
1,
2,
0,
0,
0,
0,
0,
2,
2,
1,
2,
1,
0,
1,
1,
0,
0,
2,
2,
0,
0,
1,
2,
1,
1,
0,
0,
0,
0,
0,
2,
2,
0,
0,
0,
1,
1,
1,
0,
0,
0,
0,
1,
1,
1,
1,
0,
1,
1,
0,
0,
2,
0,
0,
1,
0,
0,
2,
1,
0,
2,
2,
0,
1,
1,
1,
1,
1,
2,
2,
0,
0,
2,
1,
0,
0,
2,
0,
2,
0,
0,
0,
0,
0,
0,
2,
2,
0,
2,
1,
0,
0,
1,
0,
2,
2,
0,
1,
0,
0,
0,
0,
0,
0,
0,
0,
1,
1,
0,
0,
0,
1,
0,
2,
0,
0,
0,
0,
0,
0,
1,
0,
0,
2,
0,
2,
1,
1,
1,
0,
2,
2,
1,
0,
2,
0,
2,
1,
1,
0,
1,
0,
2,
0,
2,
0,
0,
0,
0,
0,
0,
0,
2,
0,
0,
1,
1,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
2,
2
]
}
],
"layout": {
"coloraxis": {
"colorbar": {
"title": {
"text": "Cluster"
}
},
"colorscale": [
[
0,
"#30123b"
],
[
0.07142857142857142,
"#4145ab"
],
[
0.14285714285714285,
"#4675ed"
],
[
0.21428571428571427,
"#39a2fc"
],
[
0.2857142857142857,
"#1bcfd4"
],
[
0.35714285714285715,
"#24eca6"
],
[
0.42857142857142855,
"#61fc6c"
],
[
0.5,
"#a4fc3b"
],
[
0.5714285714285714,
"#d1e834"
],
[
0.6428571428571429,
"#f3c63a"
],
[
0.7142857142857143,
"#fe9b2d"
],
[
0.7857142857142857,
"#f36315"
],
[
0.8571428571428571,
"#d93806"
],
[
0.9285714285714286,
"#b11901"
],
[
1,
"#7a0402"
]
]
},
"geo": {
"center": {},
"domain": {
"x": [
0,
1
],
"y": [
0,
1
]
}
},
"legend": {
"tracegroupgap": 0
},
"margin": {
"t": 60
},
"template": {
"data": {
"bar": [
{
"error_x": {
"color": "#2a3f5f"
},
"error_y": {
"color": "#2a3f5f"
},
"marker": {
"line": {
"color": "#E5ECF6",
"width": 0.5
}
},
"type": "bar"
}
],
"barpolar": [
{
"marker": {
"line": {
"color": "#E5ECF6",
"width": 0.5
}
},
"type": "barpolar"
}
],
"carpet": [
{
"aaxis": {
"endlinecolor": "#2a3f5f",
"gridcolor": "white",
"linecolor": "white",
"minorgridcolor": "white",
"startlinecolor": "#2a3f5f"
},
"baxis": {
"endlinecolor": "#2a3f5f",
"gridcolor": "white",
"linecolor": "white",
"minorgridcolor": "white",
"startlinecolor": "#2a3f5f"
},
"type": "carpet"
}
],
"choropleth": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"type": "choropleth"
}
],
"contour": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "contour"
}
],
"contourcarpet": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"type": "contourcarpet"
}
],
"heatmap": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "heatmap"
}
],
"heatmapgl": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "heatmapgl"
}
],
"histogram": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "histogram"
}
],
"histogram2d": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "histogram2d"
}
],
"histogram2dcontour": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "histogram2dcontour"
}
],
"mesh3d": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"type": "mesh3d"
}
],
"parcoords": [
{
"line": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "parcoords"
}
],
"pie": [
{
"automargin": true,
"type": "pie"
}
],
"scatter": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatter"
}
],
"scatter3d": [
{
"line": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatter3d"
}
],
"scattercarpet": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattercarpet"
}
],
"scattergeo": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattergeo"
}
],
"scattergl": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattergl"
}
],
"scattermapbox": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattermapbox"
}
],
"scatterpolar": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatterpolar"
}
],
"scatterpolargl": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatterpolargl"
}
],
"scatterternary": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatterternary"
}
],
"surface": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "surface"
}
],
"table": [
{
"cells": {
"fill": {
"color": "#EBF0F8"
},
"line": {
"color": "white"
}
},
"header": {
"fill": {
"color": "#C8D4E3"
},
"line": {
"color": "white"
}
},
"type": "table"
}
]
},
"layout": {
"annotationdefaults": {
"arrowcolor": "#2a3f5f",
"arrowhead": 0,
"arrowwidth": 1
},
"autotypenumbers": "strict",
"coloraxis": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"colorscale": {
"diverging": [
[
0,
"#8e0152"
],
[
0.1,
"#c51b7d"
],
[
0.2,
"#de77ae"
],
[
0.3,
"#f1b6da"
],
[
0.4,
"#fde0ef"
],
[
0.5,
"#f7f7f7"
],
[
0.6,
"#e6f5d0"
],
[
0.7,
"#b8e186"
],
[
0.8,
"#7fbc41"
],
[
0.9,
"#4d9221"
],
[
1,
"#276419"
]
],
"sequential": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"sequentialminus": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
]
},
"colorway": [
"#636efa",
"#EF553B",
"#00cc96",
"#ab63fa",
"#FFA15A",
"#19d3f3",
"#FF6692",
"#B6E880",
"#FF97FF",
"#FECB52"
],
"font": {
"color": "#2a3f5f"
},
"geo": {
"bgcolor": "white",
"lakecolor": "white",
"landcolor": "#E5ECF6",
"showlakes": true,
"showland": true,
"subunitcolor": "white"
},
"hoverlabel": {
"align": "left"
},
"hovermode": "closest",
"mapbox": {
"style": "light"
},
"paper_bgcolor": "white",
"plot_bgcolor": "#E5ECF6",
"polar": {
"angularaxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
},
"bgcolor": "#E5ECF6",
"radialaxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
}
},
"scene": {
"xaxis": {
"backgroundcolor": "#E5ECF6",
"gridcolor": "white",
"gridwidth": 2,
"linecolor": "white",
"showbackground": true,
"ticks": "",
"zerolinecolor": "white"
},
"yaxis": {
"backgroundcolor": "#E5ECF6",
"gridcolor": "white",
"gridwidth": 2,
"linecolor": "white",
"showbackground": true,
"ticks": "",
"zerolinecolor": "white"
},
"zaxis": {
"backgroundcolor": "#E5ECF6",
"gridcolor": "white",
"gridwidth": 2,
"linecolor": "white",
"showbackground": true,
"ticks": "",
"zerolinecolor": "white"
}
},
"shapedefaults": {
"line": {
"color": "#2a3f5f"
}
},
"ternary": {
"aaxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
},
"baxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
},
"bgcolor": "#E5ECF6",
"caxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
}
},
"title": {
"x": 0.05
},
"xaxis": {
"automargin": true,
"gridcolor": "white",
"linecolor": "white",
"ticks": "",
"title": {
"standoff": 15
},
"zerolinecolor": "white",
"zerolinewidth": 2
},
"yaxis": {
"automargin": true,
"gridcolor": "white",
"linecolor": "white",
"ticks": "",
"title": {
"standoff": 15
},
"zerolinecolor": "white",
"zerolinewidth": 2
}
}
}
}
},
"text/html": [
"<div> <div id=\"70b6627b-664e-42b4-b53a-763d5ceed447\" class=\"plotly-graph-div\" style=\"height:525px; width:100%;\"></div> <script type=\"text/javascript\"> require([\"plotly\"], function(Plotly) { window.PLOTLYENV=window.PLOTLYENV || {}; if (document.getElementById(\"70b6627b-664e-42b4-b53a-763d5ceed447\")) { Plotly.newPlot( \"70b6627b-664e-42b4-b53a-763d5ceed447\", [{\"coloraxis\": \"coloraxis\", \"geo\": \"geo\", \"hovertemplate\": \"<b>%{hovertext}</b><br><br>ISO Code=%{location}<br>Cluster=%{z}<extra></extra>\", \"hovertext\": [\"Afghanistan\", \"Albania\", \"Algeria\", \"American Samoa\", \"Andorra\", \"Angola\", \"Anguilla\", \"Antigua & Barbuda\", \"Argentina\", \"Armenia\", \"Aruba\", \"Australia\", \"Austria\", \"Azerbaijan\", \"Bahamas, The\", \"Bahrain\", \"Bangladesh\", \"Barbados\", \"Belarus\", \"Belgium\", \"Belize\", \"Benin\", \"Bermuda\", \"Bhutan\", \"Bolivia\", \"Bosnia & Herzegovina\", \"Botswana\", \"Brazil\", \"British Virgin Is.\", \"Brunei\", \"Bulgaria\", \"Burkina Faso\", \"Burma\", \"Burundi\", \"Cambodia\", \"Cameroon\", \"Canada\", \"Cape Verde\", \"Cayman Islands\", \"Central African Rep.\", \"Chad\", \"Chile\", \"China\", \"Colombia\", \"Comoros\", \"Congo, Dem. Rep.\", \"Congo, Repub. of the\", \"Costa Rica\", \"Cote d'Ivoire\", \"Croatia\", \"Cuba\", \"Czech Republic\", \"Denmark\", \"Djibouti\", \"Dominica\", \"Dominican Republic\", \"Ecuador\", \"Egypt\", \"El Salvador\", \"Equatorial Guinea\", \"Eritrea\", \"Estonia\", \"Ethiopia\", \"Faroe Islands\", \"Fiji\", \"Finland\", \"France\", \"French Guiana\", \"French Polynesia\", \"Gabon\", \"Gambia, The\", \"Gaza Strip\", \"Georgia\", \"Germany\", \"Ghana\", \"Gibraltar\", \"Greece\", \"Greenland\", \"Grenada\", \"Guadeloupe\", \"Guam\", \"Guatemala\", \"Guinea\", \"Guinea-Bissau\", \"Guyana\", \"Haiti\", \"Honduras\", \"Hong Kong\", \"Hungary\", \"Iceland\", \"India\", \"Indonesia\", \"Iran\", \"Iraq\", \"Ireland\", \"Isle of Man\", \"Israel\", \"Italy\", \"Jamaica\", \"Japan\", \"Jersey\", \"Jordan\", \"Kazakhstan\", \"Kenya\", \"Kiribati\", \"Korea, North\", \"Korea, South\", \"Kuwait\", \"Kyrgyzstan\", \"Laos\", \"Latvia\", \"Lebanon\", \"Lesotho\", \"Liberia\", \"Libya\", \"Liechtenstein\", \"Lithuania\", \"Luxembourg\", \"Macau\", \"Macedonia\", \"Madagascar\", \"Malawi\", \"Malaysia\", \"Maldives\", \"Mali\", \"Malta\", \"Marshall Islands\", \"Martinique\", \"Mauritania\", \"Mauritius\", \"Mayotte\", \"Mexico\", \"Micronesia, Fed. St.\", \"Moldova\", \"Mongolia\", \"Montserrat\", \"Morocco\", \"Mozambique\", \"Namibia\", \"Nauru\", \"Nepal\", \"Netherlands\", \"Netherlands Antilles\", \"New Caledonia\", \"New Zealand\", \"Nicaragua\", \"Niger\", \"Nigeria\", \"N. Mariana Islands\", \"Norway\", \"Oman\", \"Pakistan\", \"Palau\", \"Panama\", \"Papua New Guinea\", \"Paraguay\", \"Peru\", \"Philippines\", \"Poland\", \"Portugal\", \"Puerto Rico\", \"Qatar\", \"Reunion\", \"Romania\", \"Russia\", \"Rwanda\", \"Saint Helena\", \"Saint Kitts & Nevis\", \"Saint Lucia\", \"St Pierre & Miquelon\", \"Saint Vincent and the Grenadines\", \"Samoa\", \"San Marino\", \"Sao Tome & Principe\", \"Saudi Arabia\", \"Senegal\", \"Seychelles\", \"Sierra Leone\", \"Singapore\", \"Slovakia\", \"Slovenia\", \"Solomon Islands\", \"Somalia\", \"South Africa\", \"Spain\", \"Sri Lanka\", \"Sudan\", \"Suriname\", \"Swaziland\", \"Sweden\", \"Switzerland\", \"Syria\", \"Taiwan\", \"Tajikistan\", \"Tanzania\", \"Thailand\", \"Togo\", \"Tonga\", \"Trinidad & Tobago\", \"Tunisia\", \"Turkey\", \"Turkmenistan\", \"Turks & Caicos Is\", \"Tuvalu\", \"Uganda\", \"Ukraine\", \"United Arab Emirates\", \"United Kingdom\", \"United States\", \"Uruguay\", \"Uzbekistan\", \"Vanuatu\", \"Venezuela\", \"Vietnam\", \"Virgin Islands\", \"Wallis and Futuna\", \"West Bank\", \"Western Sahara\", \"Yemen\", \"Zambia\", \"Zimbabwe\"], \"locations\": [\"AFG\", \"ALB\", \"DZA\", \"ASM\", \"AND\", \"AGO\",
" \n",
"var gd = document.getElementById('70b6627b-664e-42b4-b53a-763d5ceed447');\n",
"var x = new MutationObserver(function (mutations, observer) {{\n",
" var display = window.getComputedStyle(gd).display;\n",
" if (!display || display === 'none') {{\n",
" console.log([gd, 'removed!']);\n",
" Plotly.purge(gd);\n",
" observer.disconnect();\n",
" }}\n",
"}});\n",
"\n",
"// Listen for the removal of the full notebook cells\n",
"var notebookContainer = gd.closest('#notebook-container');\n",
"if (notebookContainer) {{\n",
" x.observe(notebookContainer, {childList: true});\n",
"}}\n",
"\n",
"// Listen for the clearing of the current output cell\n",
"var outputEl = gd.closest('.output');\n",
"if (outputEl) {{\n",
" x.observe(outputEl, {childList: true});\n",
"}}\n",
"\n",
" }) }; }); </script> </div>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import plotly.express as px\n",
"\n",
"fig = px.choropleth(df, locations=\"ISO Code\",\n",
" color=\"Cluster\", # lifeExp is a column of gapminder\n",
" hover_name=\"Country\", # column to add to hover information\n",
" color_continuous_scale='Turbo'\n",
" )\n",
"fig.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 1
}