You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
udemy-ML/05-Seaborn/08-Seaborn-Exercise-Solutio...

911 lines
339 KiB

2 years ago
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"___\n",
"\n",
"<a href='http://www.pieriandata.com'><img src='../Pierian_Data_Logo.png'/></a>\n",
"___\n",
"<center><em>Copyright by Pierian Data Inc.</em></center>\n",
"<center><em>For more information, visit us at <a href='http://www.pieriandata.com'>www.pieriandata.com</a></em></center>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Seaborn Exercises - Solutions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Imports\n",
"\n",
"Run the cell below to import the libraries"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import seaborn as sns\n",
"\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The Data\n",
"\n",
"DATA SOURCE: https://www.kaggle.com/rikdifos/credit-card-approval-prediction\n",
"\n",
"Data Information:\n",
"\n",
"Credit score cards are a common risk control method in the financial industry. It uses personal information and data submitted by credit card applicants to predict the probability of future defaults and credit card borrowings. The bank is able to decide whether to issue a credit card to the applicant. Credit scores can objectively quantify the magnitude of risk."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Feature Information:\n",
"\n",
"<table>\n",
"<thead>\n",
"<tr>\n",
"<th>application_record.csv</th>\n",
"<th></th>\n",
"<th></th>\n",
"</tr>\n",
"</thead>\n",
"<tbody>\n",
"<tr>\n",
"<td>Feature name</td>\n",
"<td>Explanation</td>\n",
"<td>Remarks</td>\n",
"</tr>\n",
"<tr>\n",
"<td><code>ID</code></td>\n",
"<td>Client number</td>\n",
"<td></td>\n",
"</tr>\n",
"<tr>\n",
"<td><code>CODE_GENDER</code></td>\n",
"<td>Gender</td>\n",
"<td></td>\n",
"</tr>\n",
"<tr>\n",
"<td><code>FLAG_OWN_CAR</code></td>\n",
"<td>Is there a car</td>\n",
"<td></td>\n",
"</tr>\n",
"<tr>\n",
"<td><code>FLAG_OWN_REALTY</code></td>\n",
"<td>Is there a property</td>\n",
"<td></td>\n",
"</tr>\n",
"<tr>\n",
"<td><code>CNT_CHILDREN</code></td>\n",
"<td>Number of children</td>\n",
"<td></td>\n",
"</tr>\n",
"<tr>\n",
"<td><code>AMT_INCOME_TOTAL</code></td>\n",
"<td>Annual income</td>\n",
"<td></td>\n",
"</tr>\n",
"<tr>\n",
"<td><code>NAME_INCOME_TYPE</code></td>\n",
"<td>Income category</td>\n",
"<td></td>\n",
"</tr>\n",
"<tr>\n",
"<td><code>NAME_EDUCATION_TYPE</code></td>\n",
"<td>Education level</td>\n",
"<td></td>\n",
"</tr>\n",
"<tr>\n",
"<td><code>NAME_FAMILY_STATUS</code></td>\n",
"<td>Marital status</td>\n",
"<td></td>\n",
"</tr>\n",
"<tr>\n",
"<td><code>NAME_HOUSING_TYPE</code></td>\n",
"<td>Way of living</td>\n",
"<td></td>\n",
"</tr>\n",
"<tr>\n",
"<td><code>DAYS_BIRTH</code></td>\n",
"<td>Birthday</td>\n",
"<td>Count backwards from current day (0), -1 means yesterday</td>\n",
"</tr>\n",
"<tr>\n",
"<td><code>DAYS_EMPLOYED</code></td>\n",
"<td>Start date of employment</td>\n",
"<td>Count backwards from current day(0). If positive, it means the person currently unemployed.</td>\n",
"</tr>\n",
"<tr>\n",
"<td><code>FLAG_MOBIL</code></td>\n",
"<td>Is there a mobile phone</td>\n",
"<td></td>\n",
"</tr>\n",
"<tr>\n",
"<td><code>FLAG_WORK_PHONE</code></td>\n",
"<td>Is there a work phone</td>\n",
"<td></td>\n",
"</tr>\n",
"<tr>\n",
"<td><code>FLAG_PHONE</code></td>\n",
"<td>Is there a phone</td>\n",
"<td></td>\n",
"</tr>\n",
"<tr>\n",
"<td><code>FLAG_EMAIL</code></td>\n",
"<td>Is there an email</td>\n",
"<td></td>\n",
"</tr>\n",
"<tr>\n",
"<td><code>OCCUPATION_TYPE</code></td>\n",
"<td>Occupation</td>\n",
"<td></td>\n",
"</tr>\n",
"<tr>\n",
"<td><code>CNT_FAM_MEMBERS</code></td>\n",
"<td>Family size</td>\n",
"<td></td>\n",
"</tr>\n",
"</tbody>\n",
"</table>"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv('application_record.csv')"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ID</th>\n",
" <th>CODE_GENDER</th>\n",
" <th>FLAG_OWN_CAR</th>\n",
" <th>FLAG_OWN_REALTY</th>\n",
" <th>CNT_CHILDREN</th>\n",
" <th>AMT_INCOME_TOTAL</th>\n",
" <th>NAME_INCOME_TYPE</th>\n",
" <th>NAME_EDUCATION_TYPE</th>\n",
" <th>NAME_FAMILY_STATUS</th>\n",
" <th>NAME_HOUSING_TYPE</th>\n",
" <th>DAYS_BIRTH</th>\n",
" <th>DAYS_EMPLOYED</th>\n",
" <th>FLAG_MOBIL</th>\n",
" <th>FLAG_WORK_PHONE</th>\n",
" <th>FLAG_PHONE</th>\n",
" <th>FLAG_EMAIL</th>\n",
" <th>OCCUPATION_TYPE</th>\n",
" <th>CNT_FAM_MEMBERS</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>5008804</td>\n",
" <td>M</td>\n",
" <td>Y</td>\n",
" <td>Y</td>\n",
" <td>0</td>\n",
" <td>427500.0</td>\n",
" <td>Working</td>\n",
" <td>Higher education</td>\n",
" <td>Civil marriage</td>\n",
" <td>Rented apartment</td>\n",
" <td>-12005</td>\n",
" <td>-4542</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>2.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>5008805</td>\n",
" <td>M</td>\n",
" <td>Y</td>\n",
" <td>Y</td>\n",
" <td>0</td>\n",
" <td>427500.0</td>\n",
" <td>Working</td>\n",
" <td>Higher education</td>\n",
" <td>Civil marriage</td>\n",
" <td>Rented apartment</td>\n",
" <td>-12005</td>\n",
" <td>-4542</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>2.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>5008806</td>\n",
" <td>M</td>\n",
" <td>Y</td>\n",
" <td>Y</td>\n",
" <td>0</td>\n",
" <td>112500.0</td>\n",
" <td>Working</td>\n",
" <td>Secondary / secondary special</td>\n",
" <td>Married</td>\n",
" <td>House / apartment</td>\n",
" <td>-21474</td>\n",
" <td>-1134</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>Security staff</td>\n",
" <td>2.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>5008808</td>\n",
" <td>F</td>\n",
" <td>N</td>\n",
" <td>Y</td>\n",
" <td>0</td>\n",
" <td>270000.0</td>\n",
" <td>Commercial associate</td>\n",
" <td>Secondary / secondary special</td>\n",
" <td>Single / not married</td>\n",
" <td>House / apartment</td>\n",
" <td>-19110</td>\n",
" <td>-3051</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Sales staff</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5008809</td>\n",
" <td>F</td>\n",
" <td>N</td>\n",
" <td>Y</td>\n",
" <td>0</td>\n",
" <td>270000.0</td>\n",
" <td>Commercial associate</td>\n",
" <td>Secondary / secondary special</td>\n",
" <td>Single / not married</td>\n",
" <td>House / apartment</td>\n",
" <td>-19110</td>\n",
" <td>-3051</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Sales staff</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ID CODE_GENDER FLAG_OWN_CAR FLAG_OWN_REALTY CNT_CHILDREN \\\n",
"0 5008804 M Y Y 0 \n",
"1 5008805 M Y Y 0 \n",
"2 5008806 M Y Y 0 \n",
"3 5008808 F N Y 0 \n",
"4 5008809 F N Y 0 \n",
"\n",
" AMT_INCOME_TOTAL NAME_INCOME_TYPE NAME_EDUCATION_TYPE \\\n",
"0 427500.0 Working Higher education \n",
"1 427500.0 Working Higher education \n",
"2 112500.0 Working Secondary / secondary special \n",
"3 270000.0 Commercial associate Secondary / secondary special \n",
"4 270000.0 Commercial associate Secondary / secondary special \n",
"\n",
" NAME_FAMILY_STATUS NAME_HOUSING_TYPE DAYS_BIRTH DAYS_EMPLOYED \\\n",
"0 Civil marriage Rented apartment -12005 -4542 \n",
"1 Civil marriage Rented apartment -12005 -4542 \n",
"2 Married House / apartment -21474 -1134 \n",
"3 Single / not married House / apartment -19110 -3051 \n",
"4 Single / not married House / apartment -19110 -3051 \n",
"\n",
" FLAG_MOBIL FLAG_WORK_PHONE FLAG_PHONE FLAG_EMAIL OCCUPATION_TYPE \\\n",
"0 1 1 0 0 NaN \n",
"1 1 1 0 0 NaN \n",
"2 1 0 0 0 Security staff \n",
"3 1 0 1 1 Sales staff \n",
"4 1 0 1 1 Sales staff \n",
"\n",
" CNT_FAM_MEMBERS \n",
"0 2.0 \n",
"1 2.0 \n",
"2 2.0 \n",
"3 1.0 \n",
"4 1.0 "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 438557 entries, 0 to 438556\n",
"Data columns (total 18 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 ID 438557 non-null int64 \n",
" 1 CODE_GENDER 438557 non-null object \n",
" 2 FLAG_OWN_CAR 438557 non-null object \n",
" 3 FLAG_OWN_REALTY 438557 non-null object \n",
" 4 CNT_CHILDREN 438557 non-null int64 \n",
" 5 AMT_INCOME_TOTAL 438557 non-null float64\n",
" 6 NAME_INCOME_TYPE 438557 non-null object \n",
" 7 NAME_EDUCATION_TYPE 438557 non-null object \n",
" 8 NAME_FAMILY_STATUS 438557 non-null object \n",
" 9 NAME_HOUSING_TYPE 438557 non-null object \n",
" 10 DAYS_BIRTH 438557 non-null int64 \n",
" 11 DAYS_EMPLOYED 438557 non-null int64 \n",
" 12 FLAG_MOBIL 438557 non-null int64 \n",
" 13 FLAG_WORK_PHONE 438557 non-null int64 \n",
" 14 FLAG_PHONE 438557 non-null int64 \n",
" 15 FLAG_EMAIL 438557 non-null int64 \n",
" 16 OCCUPATION_TYPE 304354 non-null object \n",
" 17 CNT_FAM_MEMBERS 438557 non-null float64\n",
"dtypes: float64(2), int64(8), object(8)\n",
"memory usage: 60.2+ MB\n"
]
}
],
"source": [
"df.info()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## TASKS \n",
"\n",
"### Recreate the plots shown in the markdown image cells. Each plot also contains a brief description of what it is trying to convey. Note, these are meant to be quite challenging. Start by first replicating the most basic form of the plot, then attempt to adjust its styling and parameters to match the given image.\n",
"\n",
"In general do not worry about coloring,styling, or sizing matching up exactly. Instead focus on the content of the plot itself. Our goal is not to test you on recognizing figsize=(10,8) , its to test your understanding of being able to see a requested plot, and reproducing it.\n",
"\n",
"**NOTE: You may need to perform extra calculations on the pandas dataframe before calling seaborn to create the plot.**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"----\n",
"----\n",
"### TASK: Recreate the Scatter Plot shown below\n",
"\n",
"**The scatterplot attempts to show the relationship between the days employed versus the age of the person (DAYS_BIRTH) for people who were not unemployed. Note, to reproduce this chart you must remove unemployed people from the dataset first. Also note the sign of the axis, they are both transformed to be positive. Finally, feel free to adjust the *alpha* and *linewidth* parameters in the scatterplot since there are so many points stacked on top of each other.** \n",
"\n",
"<img src=\"task_one.jpg\">\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"# CODE HERE TO RECREATE THE PLOT SHOWN ABOVE"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"import warnings"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"warnings.simplefilter('ignore')"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAuEAAAHhCAYAAADAqGAPAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAEAAElEQVR4nOz9e4zk+7rfd72/39/3d61rX+ayLnudvbfPOXGOTXKCt0wESrAICINADgiFOH/YQBRjhUhIIEEikIIQkYwQIIHAyBDLjkScRAkiFnIElkGEP+KgfbBlx3ZyrnvvdZ2Znu66/m7fG3/8qqqre+5rzcxaa+Z5SaXd/euu6urqJe1PPfN8n0fFGBFCCCGEEEK8PfrbfgJCCCGEEEK8bySECyGEEEII8ZZJCBdCCCGEEOItkxAuhBBCCCHEWyYhXAghhBBCiLdMQrgQQgghhBBvmfm2n8C34fz8PP7whz/8tp+GEEIIIYR4h/3Gb/zGRYzxztO+9l6G8B/+8If89Kc//bafhhBCCCGEeIcppX7+rK9JO4oQQgghhBBvmYRwIYQQQggh3jIJ4UIIIYQQQrxlEsKFEEIIIYR4yySECyGEEEII8ZZJCBdCCCGEEOItkxAuhBBCCCHEWyYhXAghhBBCiLdMQrgQQgghhBBvmYRwIYQQQggh3jIJ4UIIIYQQQrxlEsKFEEIIIYR4yySECyGEEEII8ZZJCBdCCCGEEOItkxAuhBBCCCHEW/ZGQ7hS6s8rpR4qpf6Do2v/ulLqb+5uP1NK/c3d9R8qpZqjr/0fju7zh5RSf1sp9dtKqf+NUkrtrp8qpf6qUuq3dv978iZ/HyGEEEII8Wb1fX+4vcvedCX8LwB/9PhCjPG/HmP89RjjrwP/FvB/Ofry7+y/FmP800fX/yzwzwC/srvtH/OfB/5ajPFXgL+2+1wIIYQQQnwPbbY9m57r2/bdDeJvNITHGP9d4PJpX9tVs/8J4C897zGUUh8A0xjjX48xRuBfAf7x3Zf/GPAXdx//xaPrQgghhBDie6Tve/p461rkna2If5s94f8I8CDG+FtH136klPobSqn/t1LqH9ld+wj47Oh7PttdA7gXY/xy9/FXwL1n/TCl1J9SSv1UKfXTR48evaZfQQghhBBCiFf3bYbwP87NKviXwCcxxn8I+O8B/6pSavqyD7arksfnfP3PxRh/EmP8yZ07d77ucxZCCCGEEOIbM9/GD1VKGeC/Cvyh/bUYYwd0u49/Qyn1O8CvAp8DHx/d/ePdNYAHSqkPYoxf7tpWHr6N5y+EEEIIIV6vLMvI7M2WlEwN199F31Yl/D8L/IcxxkObiVLqjlIq2X38Y4YDmL+7azdZKaX+4V0f+Z8A/u3d3f4y8Cd3H//Jo+tCCCGEEOJ7ZjzKGGdc30bvZgCHNz+i8C8B/x7w9ymlPlNK/dO7L/2TPHkg8x8F/tZuZOG/CfzpGOP+UOc/C/yfgN8Gfgf4d3bX/wzwn1NK/RZDsP8zb+p3EUIIIYQQb16WZYfbu0wNrdTvl5/85Cfxpz/96bf9NIQQQgghxDtMKfUbMcafPO1rsjFTCCGEEEKIt0xCuBBCCCGEEG+ZhHAhhBBCCCHeMgnhQgghhBBCvGUSwoUQQgghhHjLJIQLIYQQQgjxlkkIF0IIIYQQ4i2TEC6EEEIIIcRbZr7tJyCEEEIIIb5/jhc+KqW+xWfy/SQhXAghhBBCvJIQIsc711WMaC1B/FVIO4oQQgghhHhpMd4M4ACRm5Vx8WISwoUQQgghhHjLJIQLIYQQQgjxlkkIF0IIIYQQL00pxe3ub4UcznxVcjBTCCGEEEK8Eq0VMUastYQQUEqRpilaS333ZUkIF0IIIYQQr6xpHI2LhF1dPLWWcZFijATxlyGvkhBCCCGEeCXWWroQCMfXAnS7yrh4MamECyGEEEK8w6y1h4/TNP0Wn4k4JiFcCCGEEOIdVdcWe/R5ai1V9epBvG3bw8dFUbyGZyYkhAshhBBCvIOsvRnAAezu+qtUxBfrltZff17YlvmkIN+1nuybT1INuRzOfGkSwoUQQggh3hPr9RqfDm0pk8nkhd/ftjcDOEDrh+tVVZAe9YDLdJRXIyFcCCGEEOI98MXlmnULVQppChO75sPTFwfx55Ee869P3q4IIYQQQryD0jRlH5HX6yGAp+l1cF63w/V3UYzxcPuukkq4EEIIIcQ7qqpSrLX4dF8BHwJ43/e7/33+/YuioLC3esKT7/bhzBAix9FbxYjW371tnhLChRBCCCHeYWma7m7D55u+x/b7jyHf9oxH2TPvP58U35vpKDHeDOAAcXddqe9WEJd2FCGEEEKId9xkMmFSDBXwfQAfF8P1Pl5Xxp+lKIrDTbweUgkXQgghhHgPfHg6IY+P2ezy9stMRxFvjoRwIYQQQoj3xGQyQb2gD/z7TCmFutWSonbXv2skhAshhBBCvCeyLCOzPX2Euq6Hawqy8fzbfWKvkdbqxlSU72IAB+kJF0IIIYR4r4xHGW1f0zqGm8q4WNbf9tN6rZRSh9t3lYRwIYQQQoj3SF3XuJANVfFsmIpS2+vKuHg7JIQLIYQQQgjxlklPuBBCCCHEO8Q5d/jYGIl631XylxFCCCGE+B55XshuW4c7+tw4R1Hc/J6qqqhsTW2PrqXDdfH2SAgXQgghhPieeF7Idu7m1wDc7vrtsH4+q270gEsAf/skhAshhBBCvCXe+8PHSZK80n1fJWS/jH3wbtv2sJb+bW7EvLy8PHx8enr61n7ud4WEcCGEEEKIt6DvPTYErB36QFKtyfPrKPayQfp57SivOh97sW5pr98XUNiW+eTNB/HffnDJ5fr681N7yS/fe7+CuIRwIYQQQog3zHtP5wKbxmLj8HkfOkZaM5+NgKf3b9/WWoc7Cs0mOopdENc6QcfrarmKkUyrZ4b7tr0ZwAFaP1x/kxXxy8ubARzgcg2X6eV7VRGXEC6EEEII8YaFEOjsEMB77+k6R+sgpIF02zIaFS/XWuKf/nncrWovCnOjUp6lr9by8n0XQjh8rPV3exK3hHAhhBBCiLfEe4+/FaT97vrL9IjfDtlPC+zv61hC5wLh6HMdAsZ8d4P4d/eZCSGEEEK8I5Ik4XbETvSrH86EIWTvb99EURQUt358kbz5w5mnp6ecTm5dm3yzw5kh3AzgAIGblfHvmvfzrZIQQgghxFuktWYyKrBuC0CWGLyKpFGTpikwhLLnBWtjDGY3IaXruuGaAlMMPeVq15Kyp3jx4cz5pDhMRoG3Nx3ll++dcpnKdBQhhBBCCPGGGaM5PxlRbls8QzBXSpEAWZa8VGW7KAyXqy39rqUlGMNq0zEd52itiDEepq8opdA6fYnHfHtjCY+9j8H7mIRwIYQQQog3aB+KAdI0ZTKpnjsvfLPZHD4ej8c3vtZ1HR7DcV7vwnA9z3OaxmG5rn6n1lJVLw7i33daa/StlhTNd/twpoRwIYQQQog3pK4tR9vhD6H4Wb3gX11t2HTXn4/thvsn46d+723WWjbd9Z3zPMfuru9bXt6Uvu8PH2dZ9kZ/1rMYo2U6ihBCCCHE+87amwEceG4o3mxuBnCATTdcv10Rf5pV07E6un9OxzTPX/2JvwLvPZttTx+vK/qZ7RmPvp0g/l0P3sckhAshhBBCvIRv4wDjbXmek9uO7qjvIt/lzu5W4u866Oio0m8exLfb7eHj0Wg4CNr3nqbv2ewmJiZ4siShj0Nl/NuqiH9fvNG3C0qpP6+UeqiU+g+Orv1PlFKfK6X+5u72Xzz62r+glPptpdR/pJT6zx9d/6O7a7+tlPrnj67/SCn17++u/+tKKflrCyGEEOK1W6xbFi3Xt3X74jvtXF1dHW4vo+97+r6nrmucczfmggNMxznTnOvbeAjZaZpyu77eb7f0fX8jRL+qh4stD7dc3xbbYd75re/znhu97uL53nTN/i8Af/Qp1//XMcZf393+CoBS6teAfxL4A7v7/O+VUolSKgH+d8B/Afg14I/vvhfgf757rF8GroB/+o3+NkIIIYR47zxvvfvzpGnKp1dX/OKKw+3Tq6tn9mePx2NQPZseHte
"text/plain": [
"<Figure size 864x576 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(12,8))\n",
"\n",
"# REMOVE UNEMPLOYED PEOPLE\n",
"employed = df[df['DAYS_EMPLOYED']<0]\n",
"\n",
"# MAKE BOTH POSITIVE\n",
"employed['DAYS_EMPLOYED'] = -1*employed['DAYS_EMPLOYED']\n",
"employed['DAYS_BIRTH'] = -1*employed['DAYS_BIRTH']\n",
"\n",
"# With so many points, alpha is tiny, might be an indicated that a \n",
"# scatterplot may not be the right choice!\n",
"sns.scatterplot(y='DAYS_EMPLOYED',x='DAYS_BIRTH',data=employed,\n",
" alpha=0.01,linewidth=0)\n",
"\n",
"plt.savefig('task_one.jpg')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### TASK: Recreate the Distribution Plot shown below:\n",
"\n",
"<img src=\"DistPlot_solution.png\">\n",
"\n",
"**Note, you will need to figure out how to calculate \"Age in Years\" from one of the columns in the DF. Think carefully about this. Don't worry too much if you are unable to replicate the styling exactly.**"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"# CODE HERE TO RECREATE THE PLOT SHOWN ABOVE"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAgEAAAEGCAYAAAD8C0ZEAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAa/ElEQVR4nO3df7RdZX3n8fdHIlZrNaApCxNocExtkbaKKYKoy4qDgTqGthRhdUq0tKGrtNWp9QfaNTgqbZ3pKtpfCJWU4FiRog6ppWKKoNMZ+RF+qAW0pCiSDD+iQaxadUW/88d5bnOM9+ae3Jxz7j13v19rnXX3fvaz9372Qw7nu5/97OdJVSFJkrrnUfNdAEmSND8MAiRJ6iiDAEmSOsogQJKkjjIIkCSpo5bMdwHG7clPfnKtXLlyvoshSdJY3HLLLV+qqmXTbetcELBy5Uq2bNky38WQJGksktw70zYfB0iS1FEGAZIkdZRBgCRJHWUQIElSRxkESJLUUQYBkiR1lEGAJEkdZRAgSVJHdW6wIGmSJBk4b1WNsCSSFiNbAiRJ6ihbAqQJUBddNOO2nH32GEsiaTGxJUCSpI4yCJAkqaMMAiRJ6iiDAEmSOsqOgdI82ZfX/yRpFGwJkCSpo0bWEpBkA/BS4KGqOmqPba8B/ghYVlVfSu+W6J3AycA3gFdU1a0t7zrg99qub6uqjS392cClwGOBq4FXlaOlaAL5+p+k+TLKloBLgTV7JiY5DDgR+GJf8knAqvZZD1zY8h4MnAc8BzgGOC/JQW2fC4Ff69vv+84lqSfJwB9J3TGyIKCqPgHsnGbTBcDrgP679rXAZdVzA7A0yaHAS4DNVbWzqh4GNgNr2rYnVNUN7e7/MuCUUV2LJEmL0Vg7BiZZC2yvqk/tccexHLivb31bS9tb+rZp0mc673p6LQwcfvjh+3EF0sI1yF28jx4k9Rtbx8AkjwPeCPzXcZ1zSlVdXFWrq2r1smXLxn16dZDN7pImwThbAv4DcAQw1QqwArg1yTHAduCwvrwrWtp24IV7pF/f0ldMk1/qLO/yJe2rsQUBVfUZ4Ien1pN8AVjd3g7YBPxmksvpdQJ8pKruT3IN8Pt9nQFPBM6tqp1JvprkWOBG4EzgT8d1LYvdQpu+dqGVZ19M4g/zQI8VFlg9S5qbkT0OSPI+4JPA05NsS3LWXrJfDdwDbAX+EvgNgKraCbwVuLl93tLSaHne3fb5F+DvR3EdkiQtViNrCaiqM2bZvrJvuYBzZsi3AdgwTfoW4Kjv30PDstDuYhdaeRYr61nqDkcMlCSpowwCJEnqKIMASZI6yiBAkqSOciph7Zcuvk7mQD+SFguDAI3cJAUK/sAPZpL+m0qamUGA9stCe51snD9OC+3aJWlfGQRo5Cbxx3ISyzxO1o+0OBgEaFHxx0mSBufbAZIkdZQtAeocO/9JUo8tAZIkdZQtAeoc+w1IUo8tAZIkdZRBgCRJHWUQIElSR9knQAuCPfYlafxsCZAkqaNsCdCCYI99SRq/kbUEJNmQ5KEk/9SX9j+SfDbJp5N8KMnSvm3nJtma5HNJXtKXvqalbU3yhr70I5Lc2NLfn+TAUV2LJEmL0SgfB1wKrNkjbTNwVFX9JPDPwLkASY4ETgee0fb5iyQHJDkA+HPgJOBI4IyWF+DtwAVV9TTgYeCsEV6LJEmLzsiCgKr6BLBzj7SPVtWutnoDsKItrwUur6pvVdXnga3AMe2ztaruqapvA5cDa9PrRfYi4Mq2/0bglFFdiySNW5KBP9JczWfHwF8B/r4tLwfu69u2raXNlP4k4Ct9AcVU+rSSrE+yJcmWHTt2DKn4kvbGHy9p4ZuXjoFJ3gTsAt47jvNV1cXAxQCrV6+ucZxTkobBTrMapbEHAUleAbwUOKGqpn6QtwOH9WVb0dKYIf3LwNIkS1prQH9+SQuAP17SwjfWxwFJ1gCvA15WVd/o27QJOD3JY5IcAawCbgJuBla1NwEOpNd5cFMLHq4DTm37rwOuGtd1SJK0GIzyFcH3AZ8Enp5kW5KzgD8DfgjYnOT2JO8CqKo7gCuAO4GPAOdU1XfaXf5vAtcAdwFXtLwArwd+J8lWen0ELhnVtUiStBiN7HFAVZ0xTfKMP9RVdT5w/jTpVwNXT5N+D723ByRJ0hw4bLAkSR3lsMGS5s0grwnu7j8sadhsCZAkqaNsCZA0b3yNUJpftgRIktRRBgGSJHWUjwMkLWh2HpRGx5YASZI6ypYASQuanQel0TEIkDTxfGQgzY1BgCSN2SBBizQOBgGSJp6PDKS5MQiQpAHtyx38II8fDF403wwCJGkEFkqT/7ADFy0uBgGSOmGYnQe9g9diYRAgSSOw0AKFcZTHVofJYxAgqRMW2o+ytBAYBEhSs1Ce4086A67J4bDBkiR11MhaApJsAF4KPFRVR7W0g4H3AyuBLwCnVdXD6YXf7wROBr4BvKKqbm37rAN+rx32bVW1saU/G7gUeCxwNfCq8iGTpP2wtztY8C5Wi88oWwIuBdbskfYG4NqqWgVc29YBTgJWtc964EL496DhPOA5wDHAeUkOavtcCPxa3357nkuStA+SzPrR4jKyIKCqPgHs3CN5LbCxLW8ETulLv6x6bgCWJjkUeAmwuap2VtXDwGZgTdv2hKq6od39X9Z3LEmSNIBxdww8pKrub8sPAIe05eXAfX35trW0vaVvmyZ9WknW02th4PDDD9+P4kvS4jVIhz5bAxaXeesY2O7gx/IMv6ourqrVVbV62bJl4zilJEkL3rhbAh5McmhV3d+a9B9q6duBw/ryrWhp24EX7pF+fUtfMU1+SdII+frf4jLuloBNwLq2vA64qi/9zPQcCzzSHhtcA5yY5KDWIfBE4Jq27atJjm1vFpzZdyxJkjSAUb4i+D56d/FPTrKNXi//PwSuSHIWcC9wWst+Nb3XA7fSe0XwlQBVtTPJW4GbW763VNVUZ8PfYPcrgn/fPpIkaUAjCwKq6owZNp0wTd4CzpnhOBuADdOkbwGO2p8ySpLUZY4YKElSRxkESJLUUQYBkiR1lEGAJEkdZRAgSVJHjXuwIEnSkDmUr+bKIECSNHaDBC7ODj96BgGSNOEcyldzZRAgSRo7A5eFwY6BkiR1lEGAJEkdZRAgSVJHGQRIktRRAwUBSY4fJE2SJE2OQVsC/nTANEmSNCH2+opgkuOA5wLLkvxO36YnAAeMsmCSJGm0Zhsn4EDg8S3fD/WlfxU4dVSFkiRJo7fXIKCqPg58PMmlVXXvmMokSdJe7ct8CQ4/PLNBRwx8TJKLgZX9+1TVi0ZRKEmSnBhp9AYNAv4GeBfwbuA7+3vSJP8F+FWggM8ArwQOBS4HngTcAvxyVX07yWOAy4BnA18GXl5VX2jHORc4q5Xpt6vqmv0tmyRpcjj88P4ZNAjYVVUXDuOESZYDvw0cWVX/luQK4HTgZOCCqro8ybvo/bhf2P4+XFVPS3I68Hbg5UmObPs9A3gK8A9JfrSq9jtIkSTNP3/gR2/QVwT/NslvJDk0ycFTn/047xLgsUmWAI8D7gdeBFzZtm8ETmnLa9s6bfsJ6bURrQUur6pvVdXnga3AMftRJkmSOmXQloB17e9r+9IKeOq+nrCqtif5I+CLwL8BH6XX/P+VqtrVsm0Dlrfl5cB9bd9dSR6h98hgOXBD36H79/keSdYD6wEOP/zwfS2yJEmL0kBBQFUdMawTJjmI3l38EcBX6PU3WDOs40+nqi4GLgZYvXq13UQlSWLAICDJmdOlV9Vlczjni4HPV9WOduwPAscDS5Msaa0BK4DtLf924DBgW3t88ER6HQSn0qf07yNJkmYxaJ+An+77PB94M/CyOZ7zi8CxSR7Xnu2fANwJXMfuAYjWAVe15U3sfhxxKvCx6r30uQk4PcljkhwBrAJummOZJEnqnEEfB/xW/3qSpfRe59tnVXVjkiuBW4FdwG30mur/Drg8ydta2iVtl0uA9yTZCuyk90YAVXVHe7Pgznacc3wzQJKkwQ3aMXBPX6f3TH9Oquo84Lw9ku9hmt79VfVN4BdnOM75wPlzLYckSV02aJ+Av6X
"text/plain": [
"<Figure size 576x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(8,4))\n",
"\n",
"df['YEARS'] = -1*df['DAYS_BIRTH']/365\n",
"sns.histplot(data=df,x='YEARS',linewidth=2,edgecolor='black',\n",
" color='red',bins=45,alpha=0.4)\n",
"plt.xlabel(\"Age in Years\")\n",
"plt.savefig('DistPlot_solution.png')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### TASK: Recreate the Categorical Plot shown below:\n",
"\n",
"<img src='catplot_solution.png'>\n",
"\n",
"**This plot shows information only for the *bottom half* of income earners in the data set. It shows the boxplots for each category of NAME_FAMILY_STATUS column for displaying their distribution of their total income. The hue is the \"FLAG_OWN_REALTY\" column. Note: You will need to adjust or only take part of the dataframe *before* recreating this plot.**"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"# CODE HERE"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Text(0.5, 1.0, 'Income Totals per Family Status for Bottom Half of Earners')"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA3IAAAFOCAYAAADD8C8eAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAABSrklEQVR4nO3debxVdb3/8df7MMmkAiIiqFighqYmOGQTaJqaifWrxKywTG8OhGaD080hpck0xNRr5hX0OpUZ2KWMVBxuaoKmOKCeUBRCRSYZlOl8fn+s75Z9DvuMnH322Zv38/HYj7O/3zV91t7rrL0/+7vW96uIwMzMzMzMzMpHVakDMDMzMzMzs+ZxImdmZmZmZlZmnMiZmZmZmZmVGSdyZmZmZmZmZcaJnJmZmZmZWZlxImdmZmZmZlZmnMiZWbsnaZCkkNSx1LG0NUl/ljQmPT9R0iOljqmYJHWVdI+k5ZJ+V+p4KoGkVyV9Oj2XpP+WtFTSP1qwrs9Lel3SSkkfaf1ozcysqZzImbUD+V+0yomkndMXutwjJK3KK3+inuUqIiGRdJOktXVeg+NacxsRcWRETNrc9UgaJemfkt6R9Lak+yXtmqZdJOmWZqxrhKT5mxtTPb4I9AP6RMSXNndlKdaavPdngaSLm7H8DEnfqlMXkgZvbmxN3H7B/5XNOGd8HDgMGBgRB7Rg+cuBMyKiR0Q8VSCuuueAlZJ+0ILtmJlZI7a4X7fNrPVExGtAj1xZUgD7RER16aIqDkkdI2J9gUk/j4gL2jygZkhJx2TgC8D9ZO/Z4cCGUsZVj12Al+p5rRvUwHv074gYmObZFXhY0lMR8cfNC7Us7QK8GhGrNmP55xqZZ7PPAQ28l+1ifWZm7YFb5Mzamdwv8JIuT5c/vSLpyLzpvdOlUf9O0/+YN+1kSdWSlkiaKmnHvGkh6TRJL0taIenHkj4o6e+pleZOSZ3z5j86teAsS/Ps3cz92EbSZEmLJM2TdIGkKkkfAq4DPpp+rV+W5v+spKdSLK9LuqiR12hu2o9XJJ1Qz3wXSfq9pDvSvE9K2idv+o6S7koxviLpOwWWvUXSO8CJzdj3CWkf3pE0K79lMq33d2m9KyTNlrSbpHMlvZWWOzxv/k1ahFL9ryX9sk7dVElnFQhpX+CViLgvMisi4q6IeE3SEcB5wHHp/Xg6resbkl5IMc6V9B+pvjvwZ2DHvBaXHZW1Tl6aF0utVjtJP1TWGrZC0ouSDi2wTxcDP8qL5aR0zFyQjqG30jG1TZo/d8ntSZJeI0tSGxQRrwB/B4bmbfdgSU8ou5zzCUkHp/rLgE8AV6d4rpb0UFrsaeW1wLbm/15zpXXdL2mxstbW/5G0bYH5TgJuYOP/3iYtk/W93pK6SFoJdEj7/q8WxHmApEeVnVMWptcz/5wTkk6X9DLwcu4YknR2imWhpG/kzd9F2XnyNUlvSrpOUtc0LbfsDyW9Afy3pO0k/Sltf4mkhyX5e5CZla+I8MMPP0r8AF4FPp2enwisA04m+9J0KvBvQGn6/wJ3AL2ATsCnUv0hwNvAfkAXYCLwUN42ApgCbA3sCawB7gM+AGwDPA+MSfN+BHgLODDFMCbF2KWR/QhgcHo+OW2vJzAIeAk4KW8fH6mz7Ajgw2Q/MO0NvAkcm6YNSuvuCHQH3gF2T9P6A3vWE89F6bX8Ynqtvge8kp5XAbPIEofO6XWYC3ymzrLHpnm7Flj/TcClBeq/CvRJ8Z4NvAFslbfe94DPpOmTU0znp7hOJku6cuuaAXyr7usGHJCOi6pU3g5YDfQrEM8H0javBEYCPQq8TrfUqfss8EFAwKfSuvfLe6/mN/Ra5M8D7A68DuyY935+sIH37Ja88jeB6rQPPYA/ADfXOS4mp+Oi0HtUK1ZgCLAAOCSVewNLga+l9+P4VO5T9/UvdJy39v9egfjff88bOGcMJrtcsgvQF3gI+FUD55dN1teU17vQvjd0DigwbRhwUHqdBwEvAGfWWXZ6ek+6pvduPXAJ2f/GUWTHYa80/5XA1DR/T+Ae4Cd57/t64GfpdekK/ITsR6RO6fEJ0nnVDz/88KMcH/4lyqx9mhcRv4mIDcAksmSln6T+wJHAtyNiaUSsi4gH0zInADdGxJMRsQY4l+yX90F56/15RLwTEc8BzwJ/jYi5EbGcrJUl13nBKcB/RcTjEbEhsnu01pB9CWuUpA7AaODcyFp/XgV+SfZluaCImBERsyOiJiKeAW4jSyAKqQH2ktQ1Iham/anPrIj4fUSsA64Atkr7sT/QNyIuiYi1ETEX+E2KO+fRiPhjiundetb/vfQL/zJJb6d9uSUiFkfE+oj4JdkXyd3zlnk4Iu6N7FKv35F9+f5pivF2YFChFpV8EfEPYDmQa9kaDcyIiDcLzDuX7IvtAOBO4O3Ugtaj7rx5y/xvRPwrMg8CfyX74tsSG8heg6GSOkXEqxHR1BadE4Ar0nG6kuy4Hq3aHd9cFBGrGniPdkzvzztkPyg8DuTuO/ss8HJE3Jzer9uAOcDnmrF/rfm/V8hBecfYMmWt2DvnJkZEdURMj4g1EbGI7Div73+nKfvS2OvdmCfrxPuZFOesiHgsvc6vAv9VIM6fRMSSvPdyHXBJOtdNA1YCu0sS2XnqrDT/CmA8tf9/a4AL0+vyblpXf2CXtL6HIyKasV9mZu2KEzmz9umN3JOIWJ2e9gB2ApZExNICy+wIzMtbbiWwmOzLe07+l/x3C5RzX+x3Ac6u88Vxp7SNptiO7BfveXl18+rEUoukAyU9oOwyx+XAt9N6aons3p7j0vSFkv5X0h4NxPJ63rI1wPy0H7uw8Qt+bh/PI+toY5NlG3B5RGybHtulffmesssSl6f1blNnX+q+7m+npD1Xhrx7Dxswiaz1j/T35vpmTF+gvxwRfckSsk+StQIWJOlISY+lS9CWkbWGbPJ+NEVk90udSdba9pak2/MvPWxEreM6Pe9I896nf6f3Z2tgW7LXONeBTN3157ZR77HaWIyb+b9XyGN5x9i2EbEt8FpuoqR+6TVdkJLVW2jhe1V3Xyj8ejdmvzrx3pvi3C1d2vhGinN8gTjrvpeLo/a9bavJXqu+QDdgVt7/719Sfc6iiHgvr/wLstbGvyq7XPicZuyTmVm740TOrLy8DvSup7Xm32TJCfD+vUx9yC4ja8l2LqvzZaxbaq1oirfJfv3eJa9u57xYCv0KfivZZVI7RcQ2ZJdAqdDKU2vWYWS/rs8ha0mrz065J+l+mIFkr9XrZJcw5u9jz4g4Kn9TDay3IGX3w/0A+DLZJWDbkrWcFdyXzXQLMErZfX8fAv7YlIUi4gmyS+b2ylXlT5fUBbiLrIfCfmkfprFxHwq9LqvIvljn7FBnm7dGxMfJjokgu+StKWod12TH0XpqJ0JNfp9SC9itbGxxq7v+3DYaOlYbjHEz//daYjxZnB9OyepXafnx1pTXu6WuJft/HZLiPI9N42zqe/k2WQK8Z97/7zYRkZ8Q11pXujrg7Ij4AHAM8F0VuFfTzKxcOJEzKyMRsZDsMqxrJPWS1EnSJ9Pk24BvSNo3fREfDzyeLmFqrt8A306tZJLUXVlnJD2bGOcGskv4LpPUU9IuwHfJEg/IvhQOVO0OHnqStTa+J+kA4CuF1p1aH0alL8tryC61qmkgnGGSvpAuDTszLfMY8A9gReoMoaukDpL2krR/U/axAT3JvvguAjpK+hHZvVGtLiLmA0+QtcTdVd+lhZI+rqwzju1TeQ+yL7KPpVneJLucM/eZ0JnsUshFwHplne0cnrfKN4E+Sp2OJP8EjlLWGc8OZK91bvu7SzokHZfvkX0Bb+g9y3cbcJakXdOloOOBO6KFPRCmdYxmY8+L04DdJH1FUkdlnZcMBf6Ut68fqLOaunWt+b/XEj3J/g+WSxoAfH8z1tWqr3eBON8BVqZj8NSWrii1rv8GuDLvuB6Qu4yzEGUdOA1Ol2UuJ7vkt6nHoZlZu+NEzqz8fI2stWsOWYckZwJExN+A/yRrSVlI1lHF6MKraFhEzCTrdONqso4fqmlGr43JWLJWmrlk9yPdCtyYpt1P9kX6DaX
"text/plain": [
"<Figure size 864x360 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(12,5))\n",
"\n",
"bottom_half_income = df.nsmallest(n=int(0.5*len(df)),columns='AMT_INCOME_TOTAL')\n",
"sns.boxplot(x='NAME_FAMILY_STATUS',y='AMT_INCOME_TOTAL',data=bottom_half_income,hue='FLAG_OWN_REALTY')\n",
"plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,title='FLAG_OWN_REALTY')\n",
"plt.title('Income Totals per Family Status for Bottom Half of Earners')\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### TASK: Recreate the Heat Map shown below:\n",
"\n",
"<img src='heatmap_solution.png'>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**This heatmap shows the correlation between the columns in the dataframe. You can get correlation with .corr() , also note that the FLAG_MOBIL column has NaN correlation with every other column, so you should drop it before calling .corr().**"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ID</th>\n",
" <th>CNT_CHILDREN</th>\n",
" <th>AMT_INCOME_TOTAL</th>\n",
" <th>DAYS_BIRTH</th>\n",
" <th>DAYS_EMPLOYED</th>\n",
" <th>FLAG_MOBIL</th>\n",
" <th>FLAG_WORK_PHONE</th>\n",
" <th>FLAG_PHONE</th>\n",
" <th>FLAG_EMAIL</th>\n",
" <th>CNT_FAM_MEMBERS</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>ID</th>\n",
" <td>1.000000</td>\n",
" <td>-0.005178</td>\n",
" <td>0.011179</td>\n",
" <td>-0.004994</td>\n",
" <td>-0.002467</td>\n",
" <td>NaN</td>\n",
" <td>-0.023319</td>\n",
" <td>-0.018992</td>\n",
" <td>0.032875</td>\n",
" <td>-0.001862</td>\n",
" </tr>\n",
" <tr>\n",
" <th>CNT_CHILDREN</th>\n",
" <td>-0.005178</td>\n",
" <td>1.000000</td>\n",
" <td>0.019177</td>\n",
" <td>0.349088</td>\n",
" <td>-0.241535</td>\n",
" <td>NaN</td>\n",
" <td>0.038418</td>\n",
" <td>-0.038266</td>\n",
" <td>0.028457</td>\n",
" <td>0.884781</td>\n",
" </tr>\n",
" <tr>\n",
" <th>AMT_INCOME_TOTAL</th>\n",
" <td>0.011179</td>\n",
" <td>0.019177</td>\n",
" <td>1.000000</td>\n",
" <td>0.053775</td>\n",
" <td>-0.141291</td>\n",
" <td>NaN</td>\n",
" <td>-0.033635</td>\n",
" <td>0.004444</td>\n",
" <td>0.112139</td>\n",
" <td>0.011454</td>\n",
" </tr>\n",
" <tr>\n",
" <th>DAYS_BIRTH</th>\n",
" <td>-0.004994</td>\n",
" <td>0.349088</td>\n",
" <td>0.053775</td>\n",
" <td>1.000000</td>\n",
" <td>-0.617908</td>\n",
" <td>NaN</td>\n",
" <td>0.171829</td>\n",
" <td>-0.037984</td>\n",
" <td>0.096752</td>\n",
" <td>0.306179</td>\n",
" </tr>\n",
" <tr>\n",
" <th>DAYS_EMPLOYED</th>\n",
" <td>-0.002467</td>\n",
" <td>-0.241535</td>\n",
" <td>-0.141291</td>\n",
" <td>-0.617908</td>\n",
" <td>1.000000</td>\n",
" <td>NaN</td>\n",
" <td>-0.232208</td>\n",
" <td>0.004868</td>\n",
" <td>-0.074372</td>\n",
" <td>-0.234373</td>\n",
" </tr>\n",
" <tr>\n",
" <th>FLAG_MOBIL</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>FLAG_WORK_PHONE</th>\n",
" <td>-0.023319</td>\n",
" <td>0.038418</td>\n",
" <td>-0.033635</td>\n",
" <td>0.171829</td>\n",
" <td>-0.232208</td>\n",
" <td>NaN</td>\n",
" <td>1.000000</td>\n",
" <td>0.290066</td>\n",
" <td>-0.060915</td>\n",
" <td>0.049777</td>\n",
" </tr>\n",
" <tr>\n",
" <th>FLAG_PHONE</th>\n",
" <td>-0.018992</td>\n",
" <td>-0.038266</td>\n",
" <td>0.004444</td>\n",
" <td>-0.037984</td>\n",
" <td>0.004868</td>\n",
" <td>NaN</td>\n",
" <td>0.290066</td>\n",
" <td>1.000000</td>\n",
" <td>-0.001170</td>\n",
" <td>-0.024213</td>\n",
" </tr>\n",
" <tr>\n",
" <th>FLAG_EMAIL</th>\n",
" <td>0.032875</td>\n",
" <td>0.028457</td>\n",
" <td>0.112139</td>\n",
" <td>0.096752</td>\n",
" <td>-0.074372</td>\n",
" <td>NaN</td>\n",
" <td>-0.060915</td>\n",
" <td>-0.001170</td>\n",
" <td>1.000000</td>\n",
" <td>0.022054</td>\n",
" </tr>\n",
" <tr>\n",
" <th>CNT_FAM_MEMBERS</th>\n",
" <td>-0.001862</td>\n",
" <td>0.884781</td>\n",
" <td>0.011454</td>\n",
" <td>0.306179</td>\n",
" <td>-0.234373</td>\n",
" <td>NaN</td>\n",
" <td>0.049777</td>\n",
" <td>-0.024213</td>\n",
" <td>0.022054</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ID CNT_CHILDREN AMT_INCOME_TOTAL DAYS_BIRTH \\\n",
"ID 1.000000 -0.005178 0.011179 -0.004994 \n",
"CNT_CHILDREN -0.005178 1.000000 0.019177 0.349088 \n",
"AMT_INCOME_TOTAL 0.011179 0.019177 1.000000 0.053775 \n",
"DAYS_BIRTH -0.004994 0.349088 0.053775 1.000000 \n",
"DAYS_EMPLOYED -0.002467 -0.241535 -0.141291 -0.617908 \n",
"FLAG_MOBIL NaN NaN NaN NaN \n",
"FLAG_WORK_PHONE -0.023319 0.038418 -0.033635 0.171829 \n",
"FLAG_PHONE -0.018992 -0.038266 0.004444 -0.037984 \n",
"FLAG_EMAIL 0.032875 0.028457 0.112139 0.096752 \n",
"CNT_FAM_MEMBERS -0.001862 0.884781 0.011454 0.306179 \n",
"\n",
" DAYS_EMPLOYED FLAG_MOBIL FLAG_WORK_PHONE FLAG_PHONE \\\n",
"ID -0.002467 NaN -0.023319 -0.018992 \n",
"CNT_CHILDREN -0.241535 NaN 0.038418 -0.038266 \n",
"AMT_INCOME_TOTAL -0.141291 NaN -0.033635 0.004444 \n",
"DAYS_BIRTH -0.617908 NaN 0.171829 -0.037984 \n",
"DAYS_EMPLOYED 1.000000 NaN -0.232208 0.004868 \n",
"FLAG_MOBIL NaN NaN NaN NaN \n",
"FLAG_WORK_PHONE -0.232208 NaN 1.000000 0.290066 \n",
"FLAG_PHONE 0.004868 NaN 0.290066 1.000000 \n",
"FLAG_EMAIL -0.074372 NaN -0.060915 -0.001170 \n",
"CNT_FAM_MEMBERS -0.234373 NaN 0.049777 -0.024213 \n",
"\n",
" FLAG_EMAIL CNT_FAM_MEMBERS \n",
"ID 0.032875 -0.001862 \n",
"CNT_CHILDREN 0.028457 0.884781 \n",
"AMT_INCOME_TOTAL 0.112139 0.011454 \n",
"DAYS_BIRTH 0.096752 0.306179 \n",
"DAYS_EMPLOYED -0.074372 -0.234373 \n",
"FLAG_MOBIL NaN NaN \n",
"FLAG_WORK_PHONE -0.060915 0.049777 \n",
"FLAG_PHONE -0.001170 -0.024213 \n",
"FLAG_EMAIL 1.000000 0.022054 \n",
"CNT_FAM_MEMBERS 0.022054 1.000000 "
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.corr()"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:>"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAcIAAAFZCAYAAAALuS/FAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAABK9UlEQVR4nO3debwcVZn/8c+XPWyyqIgIRjaRNUgGFWSGzRH9MbKoQEAlbriACoiC4iCjgCAgoiAaZxBQNKAIoqKiQAQUhICBhH0LAVxYRZAA4eb7++OchkrTt2/f211dfbuf9+vVr1RXVddzupP00+fUWWSbEEIIYVAtVnUBQgghhCpFIgwhhDDQIhGGEEIYaJEIQwghDLRIhCGEEAZaJMIQQggDLRJhCCGEniHpdEkPSpozzHFJ+oakOyXdKOn17caMRBhCCKGXnAHs1OT424D18mM/4LR2A0YiDCGE0DNsXw482uSUXYCznFwNrCRp9XZiRiIMIYQwnqwB3Fd4fn/eN2ZLtFWc0JMW/m39rs+bt9nxH+t2SAC8eCVh0VAVQSuICWhhNXGrMrR0NXEXW1BN3NknHNT2v6xWv3MWX/2Oj5CaM2um2Z7Wbvx2RSIMIYTQloW09mspJ712E98DwJqF56/K+8YsmkZDCCG0ZcgLW3p0yIXA+3Lv0TcCj9v+azsXjBphCCGEtiykc3djJP0I2BZ4qaT7gS8CSwLY/jZwEfB24E7gKeD97caMRBhCCKEtC9zaTfMJLZxje8oIxw3s31LAFkUiDCGE0JZO1girEIkwhBBCW4YiEYYQQhhk471GGL1Ge4ykJ/OfEyXNl/RnSbdIukbS1IqLF0IILzJkt/ToVVEj7G132d4cQNLawE8lyfb3Ki5XCCE8b0HUCEM32L4bOBj4ZNVlCSGEoiG39uhVUSMcX64HNqi6ECGEUDTeZ+GLGuH4MuycgJL2kzRT0sxp33+8m2UKIQy4IdTSo1dFjXB82Ry4pdGB4hx+VUy6HUIYXAvH+TdOJMJxQtJE4ATgmxUXJYQQFtHLtb1WRCLsbetI+jOwDPAE8A3bZ1RbpBBCWNQCj++7bJEIe4zt5fOfc2ltar4QQqhU1AhDCCEMtKFx3u8yEmEIIYS2LHTUCEMIIQywaBoNIYQw0BZ4fKeS8V36EEIIlYsaYQghhIE2FMMnQq/Z7PiPdT3mDZ85resxATY5qfvvFWD+lk9VEneNs5fqesyHN67ma2LJf1UStjI/+OTXKop8UNtXWNjBGqGknYCTgcWB/7V9bN3xtYAzgZXyOYfZvqidmJEIQxgnqkiCIbSiU8MnJC0OnAq8BbgfuFbShbZvLpz2BeBc26dJ2hC4CJjYTtxIhCGEENrSwabRLYE787JzSJoO7AIUE6GBFfP2S4C/tBs0EmEIIYS2LPDiLZ0naT9gv8KuaXnBgJo1gPsKz+8H3lB3mSOBiyV9AlgO2HG05a0XiTCEEEJbWm0aLa6S04YpwBm2T5T0JuD7kja2PeZlESMRhhBCaMvCzjWNPgCsWXj+qryv6IPATgC2r5K0DPBS4MGxBh3ffV5DCCFUbojFWnq04FpgPUmvkbQUsBdwYd0584AdACS9jrQ6z0PtlD9qhCGEENoy1KG5Rm0/J+kA4DekoRGn275J0peAmbYvBD4NfFfSQaSOM1Ntt7U0cCTCEEIIbVnYwcbFPCbworp9RxS2bwa27lhAIhGGEEJoU6u9RntVT90jlPQKSdMl3SXpOkkXSVpfknNX2dp5p0iaKulUSbMk3Sxpft6eJeldTWIcIunWfN61kt6X98+QNLlw3kRJc/L2tpJ+kbenSjqlwXXnSpqdHzdLOirfxK1da36hrGdJWrJw7ccLZZ8lacd8zJJOrCv7kW1+zCGE0FFDXqylR6/qmZJJEnA+MMP2Ora3AD4HrEbqDfSpfPP0ebb3tz0JeDtwl+1J+fGTYWJ8lDRjwZb5dTtAR2eL3c72JqRBoWsD3ykcuyvH3ITUE2qPwrErCmWfZPt3ef8zwO6SXtrBMoYQQkd1sLNMJXqpZNsBC2x/u7bD9g2kwZUPAZcA+7YZ4/PAx2z/M1//n7bPbPOaL2L7SeCjwK6SVqk7NgRcQxo4OpLnSGNu2p8MMIQQSrLQaunRq3opEW4MXNfk+HHAIXkuulGTtCKwQm3qnmGcXWuepO5m7WjlZHsPsF5dOZYhzZTw68LubeqaRtcpHDsV2EfSS5rFk7SfpJmSZj5641XtFD2EEEYlaoRdkhPYn4C9SwyzT615ktTc2q7iT6B1coL9O/BX2zcWjtU3jd5VO5AT6lnAJ5sFsj3N9mTbk1fZ9E0dKHoIIbRmgRdv6dGreikR3gRsMcI5xwCHMob7ejmhPClp7TGUbdQkrUCaEf32vKt2j3AdYAtJ7xjF5b5Omk1huQ4WMYQQOmKhF2vp0at6qWSXAkvnSVkBkLQphel2bN9KmoX8v8YY4yvAqbmZFEnL13qNdpKk5YFvARfYfqx4zPbDwGGkjkAtsf0ocC4pGYYQQk8ZQi09elXPJMI8M8BuwI55+MRNpMT1t7pTjyb1uhyL04DLSGtczQGuAMYyUetUSfcXHrXyXJavew1pGqCPDPP6C4BlJW2Tn9ffI2w0/ONE0nx6IYTQU8Z7jbCnBtTb/guLDiuo2bhwzg3UJXDbc4vnNLm+ga/mR/2xbYe7pu0ZwIy8fQZwRoPLT2wSd5Hy5XJsVjilYUcY28sXtv8OLDtcjBBCqEovjxFsRU8lwhBCCOPPwh5u9mxFXyZCSafy4rnoTrb9vSrKE0II/WzBwt7tEdqKvkyEtvevugwhhDAoenmMYCv6MhGGEELonl6eNaYVkQhDCCG0pZPLMFUhEmEfqmICh01O+lj3gwKzDzqtkrgbndr99/vQZrDCvLbWHx2TJZ/qekgAhpYa+ZwyLFh+5HPKsM8pB1cSd86L+tCPXqcW5q1KJMIQxokqkmAIrYim0RBCCAOtl+cRbcX4btgNIYRQuU4uwyRpJ0m3SbpT0mHDnLNHXuT8Jkk/bLf8USMMIYTQlk5Nn5aX2TuVtID6/aTpMC+0fXPhnPVIczVvbfsxSS9vN27UCEMIIbRlIWrp0YItgTtt3237WWA6sEvdOR8GTq0taGD7wXbLH4kwhBBCW4aslh7FBcTzY7+6S60B3Fd4fn/eV7Q+sL6kP0i6WtJO7ZY/mkZDCCG05bkWp1izPQ2Y1ma4JYD1gG1JKxFdLmkT2/8Y6wWjRhhCCKEtHWwafYDCGrSkRPdA3Tn3AxfaXmD7HtLi5+u1U/5IhCGEENrSwV6j1wLrSXqNpKWAvYAL6865gFQbRNJLSU2ld7dT/mgaDSGE0JZO9Rq1/ZykA4DfAIsDp9u+SdKXgJm2L8zH/lPSzcAQ8Bnbj7QTt6XSS9pVkiVtkJ9PzM+PKpzzUkkLJJ0i6fDCautDhe1PDnP9IyUdkrfPkPSApKUL151bOHd9SRdJukPS9ZLOlbRaPvZmSddIujU/9quLYUnrFvYdmPdNzs/nSppdKO83hinvqfn4zZLmF1eWV/KFXL7bJV0maaP8uj/l8+ZJeqjwuomSlsj7jq2LNaNWvhBC6EWdHEdo+yLb69tex/bRed8ROQni5GDbG9rexPb0dsvfao1wCnBl/vOLed89wP8DvpCfvxu4KRf0aOBoAElP2p40ynINAR8AFplIUtIywC+Bg23/PO/bFniZJAE/BHa1fX2uMv9G0gO2f5kvMZtU1a4l8OfLXLCd7YebFa62zJOkicAviu8v/5rZCtjM9lOS/hO4UNJGtt+Qz5kKTLZ9QOF1byO1db9b0ufyKvYhhNDzxvvCvCPWCCUtD7wZ+CApidQ8BdxSqK3sCZzboXJ9HThIUn2i3hu4qpYEAWzPsD0H2B84w/b1ef/DwGeB4swEF5DHpEhaB3gcaJr0xuBQ4ADbT+VyXAz8EdhnhNdNAU4G5gFvGm3QYrfkR2+4arQvDyGEMXtu4WItPXpVKyXbBfi17duBRyRtUTg2HdhL0pqkWtxfOlSueaQa6Hvr9m8MXDfMazZqcGxm3l/zT+A+SRuTkvo5Da5zWaHJ8qD
"text/plain": [
"<Figure size 432x288 with 2 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.heatmap(df.drop('FLAG_MOBIL',axis=1).corr(),cmap=\"viridis\")"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 1
}