You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
1487 lines
394 KiB
1487 lines
394 KiB
2 years ago
|
{
|
||
|
"cells": [
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"___\n",
|
||
|
"\n",
|
||
|
"<a href='http://www.pieriandata.com'><img src='../Pierian_Data_Logo.png'/></a>\n",
|
||
|
"___\n",
|
||
|
"<center><em>Copyright by Pierian Data Inc.</em></center>\n",
|
||
|
"<center><em>For more information, visit us at <a href='http://www.pieriandata.com'>www.pieriandata.com</a></em></center>"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"# Logistic Regression Project Exercise - Solutions\n",
|
||
|
"\n",
|
||
|
"**GOAL: Create a Classification Model that can predict whether or not a person has presence of heart disease based on physical features of that person (age,sex, cholesterol, etc...)**\n",
|
||
|
"\n",
|
||
|
"**Complete the TASKs written in bold below.**"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## Imports\n",
|
||
|
"\n",
|
||
|
"**TASK: Run the cell below to import the necessary libraries.**"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 1,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"import numpy as np\n",
|
||
|
"import pandas as pd\n",
|
||
|
"import seaborn as sns\n",
|
||
|
"import matplotlib.pyplot as plt"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## Data\n",
|
||
|
"\n",
|
||
|
"This database contains 14 physical attributes based on physical testing of a patient. Blood samples are taken and the patient also conducts a brief exercise test. The \"goal\" field refers to the presence of heart disease in the patient. It is integer (0 for no presence, 1 for presence). In general, to confirm 100% if a patient has heart disease can be quite an invasive process, so if we can create a model that accurately predicts the likelihood of heart disease, we can help avoid expensive and invasive procedures.\n",
|
||
|
"\n",
|
||
|
"Content\n",
|
||
|
"\n",
|
||
|
"Attribute Information:\n",
|
||
|
"\n",
|
||
|
"* age\n",
|
||
|
"* sex\n",
|
||
|
"* chest pain type (4 values)\n",
|
||
|
"* resting blood pressure\n",
|
||
|
"* serum cholestoral in mg/dl\n",
|
||
|
"* fasting blood sugar > 120 mg/dl\n",
|
||
|
"* resting electrocardiographic results (values 0,1,2)\n",
|
||
|
"* maximum heart rate achieved\n",
|
||
|
"* exercise induced angina\n",
|
||
|
"* oldpeak = ST depression induced by exercise relative to rest\n",
|
||
|
"* the slope of the peak exercise ST segment\n",
|
||
|
"* number of major vessels (0-3) colored by flourosopy\n",
|
||
|
"* thal: 3 = normal; 6 = fixed defect; 7 = reversable defect\n",
|
||
|
"* target:0 for no presence of heart disease, 1 for presence of heart disease\n",
|
||
|
"\n",
|
||
|
"Original Source: https://archive.ics.uci.edu/ml/datasets/Heart+Disease\n",
|
||
|
"\n",
|
||
|
"Creators:\n",
|
||
|
"\n",
|
||
|
"Hungarian Institute of Cardiology. Budapest: Andras Janosi, M.D.\n",
|
||
|
"University Hospital, Zurich, Switzerland: William Steinbrunn, M.D.\n",
|
||
|
"University Hospital, Basel, Switzerland: Matthias Pfisterer, M.D.\n",
|
||
|
"V.A. Medical Center, Long Beach and Cleveland Clinic Foundation: Robert Detrano, M.D., Ph.D."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"----\n",
|
||
|
"\n",
|
||
|
"**TASK: Run the cell below to read in the data.**"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 2,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"df = pd.read_csv('../DATA/heart.csv')"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 3,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/html": [
|
||
|
"<div>\n",
|
||
|
"<style scoped>\n",
|
||
|
" .dataframe tbody tr th:only-of-type {\n",
|
||
|
" vertical-align: middle;\n",
|
||
|
" }\n",
|
||
|
"\n",
|
||
|
" .dataframe tbody tr th {\n",
|
||
|
" vertical-align: top;\n",
|
||
|
" }\n",
|
||
|
"\n",
|
||
|
" .dataframe thead th {\n",
|
||
|
" text-align: right;\n",
|
||
|
" }\n",
|
||
|
"</style>\n",
|
||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
||
|
" <thead>\n",
|
||
|
" <tr style=\"text-align: right;\">\n",
|
||
|
" <th></th>\n",
|
||
|
" <th>age</th>\n",
|
||
|
" <th>sex</th>\n",
|
||
|
" <th>cp</th>\n",
|
||
|
" <th>trestbps</th>\n",
|
||
|
" <th>chol</th>\n",
|
||
|
" <th>fbs</th>\n",
|
||
|
" <th>restecg</th>\n",
|
||
|
" <th>thalach</th>\n",
|
||
|
" <th>exang</th>\n",
|
||
|
" <th>oldpeak</th>\n",
|
||
|
" <th>slope</th>\n",
|
||
|
" <th>ca</th>\n",
|
||
|
" <th>thal</th>\n",
|
||
|
" <th>target</th>\n",
|
||
|
" </tr>\n",
|
||
|
" </thead>\n",
|
||
|
" <tbody>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>0</th>\n",
|
||
|
" <td>63</td>\n",
|
||
|
" <td>1</td>\n",
|
||
|
" <td>3</td>\n",
|
||
|
" <td>145</td>\n",
|
||
|
" <td>233</td>\n",
|
||
|
" <td>1</td>\n",
|
||
|
" <td>0</td>\n",
|
||
|
" <td>150</td>\n",
|
||
|
" <td>0</td>\n",
|
||
|
" <td>2.3</td>\n",
|
||
|
" <td>0</td>\n",
|
||
|
" <td>0</td>\n",
|
||
|
" <td>1</td>\n",
|
||
|
" <td>1</td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>1</th>\n",
|
||
|
" <td>37</td>\n",
|
||
|
" <td>1</td>\n",
|
||
|
" <td>2</td>\n",
|
||
|
" <td>130</td>\n",
|
||
|
" <td>250</td>\n",
|
||
|
" <td>0</td>\n",
|
||
|
" <td>1</td>\n",
|
||
|
" <td>187</td>\n",
|
||
|
" <td>0</td>\n",
|
||
|
" <td>3.5</td>\n",
|
||
|
" <td>0</td>\n",
|
||
|
" <td>0</td>\n",
|
||
|
" <td>2</td>\n",
|
||
|
" <td>1</td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>2</th>\n",
|
||
|
" <td>41</td>\n",
|
||
|
" <td>0</td>\n",
|
||
|
" <td>1</td>\n",
|
||
|
" <td>130</td>\n",
|
||
|
" <td>204</td>\n",
|
||
|
" <td>0</td>\n",
|
||
|
" <td>0</td>\n",
|
||
|
" <td>172</td>\n",
|
||
|
" <td>0</td>\n",
|
||
|
" <td>1.4</td>\n",
|
||
|
" <td>2</td>\n",
|
||
|
" <td>0</td>\n",
|
||
|
" <td>2</td>\n",
|
||
|
" <td>1</td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>3</th>\n",
|
||
|
" <td>56</td>\n",
|
||
|
" <td>1</td>\n",
|
||
|
" <td>1</td>\n",
|
||
|
" <td>120</td>\n",
|
||
|
" <td>236</td>\n",
|
||
|
" <td>0</td>\n",
|
||
|
" <td>1</td>\n",
|
||
|
" <td>178</td>\n",
|
||
|
" <td>0</td>\n",
|
||
|
" <td>0.8</td>\n",
|
||
|
" <td>2</td>\n",
|
||
|
" <td>0</td>\n",
|
||
|
" <td>2</td>\n",
|
||
|
" <td>1</td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>4</th>\n",
|
||
|
" <td>57</td>\n",
|
||
|
" <td>0</td>\n",
|
||
|
" <td>0</td>\n",
|
||
|
" <td>120</td>\n",
|
||
|
" <td>354</td>\n",
|
||
|
" <td>0</td>\n",
|
||
|
" <td>1</td>\n",
|
||
|
" <td>163</td>\n",
|
||
|
" <td>1</td>\n",
|
||
|
" <td>0.6</td>\n",
|
||
|
" <td>2</td>\n",
|
||
|
" <td>0</td>\n",
|
||
|
" <td>2</td>\n",
|
||
|
" <td>1</td>\n",
|
||
|
" </tr>\n",
|
||
|
" </tbody>\n",
|
||
|
"</table>\n",
|
||
|
"</div>"
|
||
|
],
|
||
|
"text/plain": [
|
||
|
" age sex cp trestbps chol fbs restecg thalach exang oldpeak slope \\\n",
|
||
|
"0 63 1 3 145 233 1 0 150 0 2.3 0 \n",
|
||
|
"1 37 1 2 130 250 0 1 187 0 3.5 0 \n",
|
||
|
"2 41 0 1 130 204 0 0 172 0 1.4 2 \n",
|
||
|
"3 56 1 1 120 236 0 1 178 0 0.8 2 \n",
|
||
|
"4 57 0 0 120 354 0 1 163 1 0.6 2 \n",
|
||
|
"\n",
|
||
|
" ca thal target \n",
|
||
|
"0 0 1 1 \n",
|
||
|
"1 0 2 1 \n",
|
||
|
"2 0 2 1 \n",
|
||
|
"3 0 2 1 \n",
|
||
|
"4 0 2 1 "
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 3,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df.head()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 4,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"array([1, 0], dtype=int64)"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 4,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df['target'].unique()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Exploratory Data Analysis and Visualization\n",
|
||
|
"\n",
|
||
|
"Feel free to explore the data further on your own.\n",
|
||
|
"\n",
|
||
|
"**TASK: Explore if the dataset has any missing data points and create a statistical summary of the numerical features as shown below.**"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 5,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# CODE HERE"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 6,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"<class 'pandas.core.frame.DataFrame'>\n",
|
||
|
"RangeIndex: 303 entries, 0 to 302\n",
|
||
|
"Data columns (total 14 columns):\n",
|
||
|
" # Column Non-Null Count Dtype \n",
|
||
|
"--- ------ -------------- ----- \n",
|
||
|
" 0 age 303 non-null int64 \n",
|
||
|
" 1 sex 303 non-null int64 \n",
|
||
|
" 2 cp 303 non-null int64 \n",
|
||
|
" 3 trestbps 303 non-null int64 \n",
|
||
|
" 4 chol 303 non-null int64 \n",
|
||
|
" 5 fbs 303 non-null int64 \n",
|
||
|
" 6 restecg 303 non-null int64 \n",
|
||
|
" 7 thalach 303 non-null int64 \n",
|
||
|
" 8 exang 303 non-null int64 \n",
|
||
|
" 9 oldpeak 303 non-null float64\n",
|
||
|
" 10 slope 303 non-null int64 \n",
|
||
|
" 11 ca 303 non-null int64 \n",
|
||
|
" 12 thal 303 non-null int64 \n",
|
||
|
" 13 target 303 non-null int64 \n",
|
||
|
"dtypes: float64(1), int64(13)\n",
|
||
|
"memory usage: 33.3 KB\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df.info()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 7,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# CODE HERE"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 8,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/html": [
|
||
|
"<div>\n",
|
||
|
"<style scoped>\n",
|
||
|
" .dataframe tbody tr th:only-of-type {\n",
|
||
|
" vertical-align: middle;\n",
|
||
|
" }\n",
|
||
|
"\n",
|
||
|
" .dataframe tbody tr th {\n",
|
||
|
" vertical-align: top;\n",
|
||
|
" }\n",
|
||
|
"\n",
|
||
|
" .dataframe thead th {\n",
|
||
|
" text-align: right;\n",
|
||
|
" }\n",
|
||
|
"</style>\n",
|
||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
||
|
" <thead>\n",
|
||
|
" <tr style=\"text-align: right;\">\n",
|
||
|
" <th></th>\n",
|
||
|
" <th>count</th>\n",
|
||
|
" <th>mean</th>\n",
|
||
|
" <th>std</th>\n",
|
||
|
" <th>min</th>\n",
|
||
|
" <th>25%</th>\n",
|
||
|
" <th>50%</th>\n",
|
||
|
" <th>75%</th>\n",
|
||
|
" <th>max</th>\n",
|
||
|
" </tr>\n",
|
||
|
" </thead>\n",
|
||
|
" <tbody>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>age</th>\n",
|
||
|
" <td>303.0</td>\n",
|
||
|
" <td>54.366337</td>\n",
|
||
|
" <td>9.082101</td>\n",
|
||
|
" <td>29.0</td>\n",
|
||
|
" <td>47.5</td>\n",
|
||
|
" <td>55.0</td>\n",
|
||
|
" <td>61.0</td>\n",
|
||
|
" <td>77.0</td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>sex</th>\n",
|
||
|
" <td>303.0</td>\n",
|
||
|
" <td>0.683168</td>\n",
|
||
|
" <td>0.466011</td>\n",
|
||
|
" <td>0.0</td>\n",
|
||
|
" <td>0.0</td>\n",
|
||
|
" <td>1.0</td>\n",
|
||
|
" <td>1.0</td>\n",
|
||
|
" <td>1.0</td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>cp</th>\n",
|
||
|
" <td>303.0</td>\n",
|
||
|
" <td>0.966997</td>\n",
|
||
|
" <td>1.032052</td>\n",
|
||
|
" <td>0.0</td>\n",
|
||
|
" <td>0.0</td>\n",
|
||
|
" <td>1.0</td>\n",
|
||
|
" <td>2.0</td>\n",
|
||
|
" <td>3.0</td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>trestbps</th>\n",
|
||
|
" <td>303.0</td>\n",
|
||
|
" <td>131.623762</td>\n",
|
||
|
" <td>17.538143</td>\n",
|
||
|
" <td>94.0</td>\n",
|
||
|
" <td>120.0</td>\n",
|
||
|
" <td>130.0</td>\n",
|
||
|
" <td>140.0</td>\n",
|
||
|
" <td>200.0</td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>chol</th>\n",
|
||
|
" <td>303.0</td>\n",
|
||
|
" <td>246.264026</td>\n",
|
||
|
" <td>51.830751</td>\n",
|
||
|
" <td>126.0</td>\n",
|
||
|
" <td>211.0</td>\n",
|
||
|
" <td>240.0</td>\n",
|
||
|
" <td>274.5</td>\n",
|
||
|
" <td>564.0</td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>fbs</th>\n",
|
||
|
" <td>303.0</td>\n",
|
||
|
" <td>0.148515</td>\n",
|
||
|
" <td>0.356198</td>\n",
|
||
|
" <td>0.0</td>\n",
|
||
|
" <td>0.0</td>\n",
|
||
|
" <td>0.0</td>\n",
|
||
|
" <td>0.0</td>\n",
|
||
|
" <td>1.0</td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>restecg</th>\n",
|
||
|
" <td>303.0</td>\n",
|
||
|
" <td>0.528053</td>\n",
|
||
|
" <td>0.525860</td>\n",
|
||
|
" <td>0.0</td>\n",
|
||
|
" <td>0.0</td>\n",
|
||
|
" <td>1.0</td>\n",
|
||
|
" <td>1.0</td>\n",
|
||
|
" <td>2.0</td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>thalach</th>\n",
|
||
|
" <td>303.0</td>\n",
|
||
|
" <td>149.646865</td>\n",
|
||
|
" <td>22.905161</td>\n",
|
||
|
" <td>71.0</td>\n",
|
||
|
" <td>133.5</td>\n",
|
||
|
" <td>153.0</td>\n",
|
||
|
" <td>166.0</td>\n",
|
||
|
" <td>202.0</td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>exang</th>\n",
|
||
|
" <td>303.0</td>\n",
|
||
|
" <td>0.326733</td>\n",
|
||
|
" <td>0.469794</td>\n",
|
||
|
" <td>0.0</td>\n",
|
||
|
" <td>0.0</td>\n",
|
||
|
" <td>0.0</td>\n",
|
||
|
" <td>1.0</td>\n",
|
||
|
" <td>1.0</td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>oldpeak</th>\n",
|
||
|
" <td>303.0</td>\n",
|
||
|
" <td>1.039604</td>\n",
|
||
|
" <td>1.161075</td>\n",
|
||
|
" <td>0.0</td>\n",
|
||
|
" <td>0.0</td>\n",
|
||
|
" <td>0.8</td>\n",
|
||
|
" <td>1.6</td>\n",
|
||
|
" <td>6.2</td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>slope</th>\n",
|
||
|
" <td>303.0</td>\n",
|
||
|
" <td>1.399340</td>\n",
|
||
|
" <td>0.616226</td>\n",
|
||
|
" <td>0.0</td>\n",
|
||
|
" <td>1.0</td>\n",
|
||
|
" <td>1.0</td>\n",
|
||
|
" <td>2.0</td>\n",
|
||
|
" <td>2.0</td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>ca</th>\n",
|
||
|
" <td>303.0</td>\n",
|
||
|
" <td>0.729373</td>\n",
|
||
|
" <td>1.022606</td>\n",
|
||
|
" <td>0.0</td>\n",
|
||
|
" <td>0.0</td>\n",
|
||
|
" <td>0.0</td>\n",
|
||
|
" <td>1.0</td>\n",
|
||
|
" <td>4.0</td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>thal</th>\n",
|
||
|
" <td>303.0</td>\n",
|
||
|
" <td>2.313531</td>\n",
|
||
|
" <td>0.612277</td>\n",
|
||
|
" <td>0.0</td>\n",
|
||
|
" <td>2.0</td>\n",
|
||
|
" <td>2.0</td>\n",
|
||
|
" <td>3.0</td>\n",
|
||
|
" <td>3.0</td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>target</th>\n",
|
||
|
" <td>303.0</td>\n",
|
||
|
" <td>0.544554</td>\n",
|
||
|
" <td>0.498835</td>\n",
|
||
|
" <td>0.0</td>\n",
|
||
|
" <td>0.0</td>\n",
|
||
|
" <td>1.0</td>\n",
|
||
|
" <td>1.0</td>\n",
|
||
|
" <td>1.0</td>\n",
|
||
|
" </tr>\n",
|
||
|
" </tbody>\n",
|
||
|
"</table>\n",
|
||
|
"</div>"
|
||
|
],
|
||
|
"text/plain": [
|
||
|
" count mean std min 25% 50% 75% max\n",
|
||
|
"age 303.0 54.366337 9.082101 29.0 47.5 55.0 61.0 77.0\n",
|
||
|
"sex 303.0 0.683168 0.466011 0.0 0.0 1.0 1.0 1.0\n",
|
||
|
"cp 303.0 0.966997 1.032052 0.0 0.0 1.0 2.0 3.0\n",
|
||
|
"trestbps 303.0 131.623762 17.538143 94.0 120.0 130.0 140.0 200.0\n",
|
||
|
"chol 303.0 246.264026 51.830751 126.0 211.0 240.0 274.5 564.0\n",
|
||
|
"fbs 303.0 0.148515 0.356198 0.0 0.0 0.0 0.0 1.0\n",
|
||
|
"restecg 303.0 0.528053 0.525860 0.0 0.0 1.0 1.0 2.0\n",
|
||
|
"thalach 303.0 149.646865 22.905161 71.0 133.5 153.0 166.0 202.0\n",
|
||
|
"exang 303.0 0.326733 0.469794 0.0 0.0 0.0 1.0 1.0\n",
|
||
|
"oldpeak 303.0 1.039604 1.161075 0.0 0.0 0.8 1.6 6.2\n",
|
||
|
"slope 303.0 1.399340 0.616226 0.0 1.0 1.0 2.0 2.0\n",
|
||
|
"ca 303.0 0.729373 1.022606 0.0 0.0 0.0 1.0 4.0\n",
|
||
|
"thal 303.0 2.313531 0.612277 0.0 2.0 2.0 3.0 3.0\n",
|
||
|
"target 303.0 0.544554 0.498835 0.0 0.0 1.0 1.0 1.0"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 8,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df.describe().transpose()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Visualization Tasks\n",
|
||
|
"\n",
|
||
|
"**TASK: Create a bar plot that shows the total counts per target value.**"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 9,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# CODE HERE!"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 10,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"<AxesSubplot:xlabel='target', ylabel='count'>"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 10,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
},
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEGCAYAAACKB4k+AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy86wFpkAAAACXBIWXMAAAsTAAALEwEAmpwYAAAQ/klEQVR4nO3de7BdZX3G8e9jolC8FJgcKCa0oU60BeulnuKtdRTqQMdLMlqcMFIzSJtaqdVOq4XaEaeddJxqbR0rnckoEloLTRElOqOVpipjK+ABtXKRkgpCBMlB6r2DRn/9Y6+8buM+yfHI3uvA/n5mMmuvd71rr9+ZOcmTd13elapCkiSAh/RdgCRp+TAUJEmNoSBJagwFSVJjKEiSmpV9F/CTWLVqVa1du7bvMiTpAeXaa6+9p6pmRm17QIfC2rVrmZub67sMSXpASfLFhbZ5+kiS1BgKkqTGUJAkNYaCJKkxFCRJjaEgSWoMBUlSYyhIkhpDQZLUPKCfaJYezG7/81/quwQtQz/7hs+N9fsdKUiSGkNBktSMLRSSXJBkT5Lr92t/VZKbk9yQ5K+G2s9Nsqvbdsq46pIkLWyc1xQuBP4OuGhfQ5LnAOuBJ1TVfUmO6tqPBzYCJwCPBv4tyWOr6ntjrE+StJ+xjRSq6krg3v2afw94U1Xd1/XZ07WvBy6pqvuq6lZgF3DiuGqTJI026WsKjwV+LcnVST6e5Fe69tXAHUP9dndtPyLJ5iRzSebm5+fHXK4kTZdJh8JK4AjgacBrge1JAmRE3xr1BVW1tapmq2p2Zmbki4MkSUs06VDYDVxWA9cA3wdWde3HDvVbA9w54dokaepNOhTeD5wEkOSxwMOAe4AdwMYkhyQ5DlgHXDPh2iRp6o3t7qMkFwPPBlYl2Q2cB1wAXNDdpvodYFNVFXBDku3AjcBe4GzvPJKkyRtbKFTV6QtsOmOB/luALeOqR5J0cD7RLElqDAVJUmMoSJIaQ0GS1BgKkqTGUJAkNYaCJKkxFCRJjaEgSWoMBUlSYyhIkhpDQZLUGAqSpMZQkCQ1hoIkqTEUJEnN2EIhyQVJ9nRvWdt/2x8nqSSrhtrOTbIryc1JThlXXZKkhY1zpHAhcOr+jUmOBZ4L3D7UdjywETih2+f8JCvGWJskaYSxhUJVXQncO2LT3wCvA2qobT1wSVXdV1W3AruAE8dVmyRptIleU0jyQuBLVfXZ/TatBu4YWt/dtY36js1J5pLMzc/Pj6lSSZpOEwuFJIcBrwfeMGrziLYa0UZVba2q2aqanZmZuT9LlKSpt3KCx3oMcBzw2SQAa4DrkpzIYGRw7FDfNcCdE6xNksQEQ6GqPgcctW89yW3AbFXdk2QH8E9J3go8GlgHXDOJup7y2osmcRg9wFz75pf1XYLUi3Heknox8EngcUl2Jzlrob5VdQOwHbgR+DBwdlV9b1y1SZJGG9tIoapOP8j2tfutbwG2jKseSdLB+USzJKkxFCRJjaEgSWoMBUlSYyhIkhpDQZLUGAqSpMZQkCQ1hoIkqTEUJEmNoSBJagwFSVJjKEiSGkNBktQYCpKkxlCQJDXjfPPaBUn2JLl+qO3NST6f5L+SvC/J4UPbzk2yK8nNSU4ZV12SpIWNc6RwIXDqfm1XAI+vqicA/w2cC5DkeGAjcEK3z/lJVoyxNknSCGMLhaq6Erh3v7aPVNXebvUqYE33eT1wSVXdV1W3AruAE8dVmyRptD6vKbwc+FD3eTVwx9C23V3bj0iyOclckrn5+fkxlyhJ06WXUEjyemAv8J59TSO61ah9q2prVc1W1ezMzMy4SpSkqbRy0gdMsgl4PnByVe37h383cOxQtzXAnZOuTZKm3URHCklOBf4EeGFVfXto0w5gY5JDkhwHrAOumWRtkqQxjhSSXAw8G1iVZDdwHoO7jQ4BrkgCcFVVvaKqbkiyHbiRwWmls6vqe+OqTZI02thCoapOH9H8rgP03wJsGVc9kqSD84lmSVJjKEiSGkNBktQYCpKkxlCQJDWGgiSpMRQkSY2hIElqDAVJUmMoSJIaQ0GS1BgKkqTGUJAkNYaCJKkxFCRJjaEgSWrGFgpJLkiyJ8n1Q21HJrkiyS3d8oihbecm2ZXk5iSnjKsuSdLCxjlSuBA4db+2c4CdVbUO2Nmtk+R4YCNwQrfP+UlWjLE2SdIIYwuFqroSuHe/5vXAtu7zNmDDUPslVXVfVd0K7AJOHFdtkqTRJn1N4eiqugugWx7Vta8G7hjqt7tr+xFJNieZSzI3Pz8/1mIladoslwvNGdFWozpW1daqmq2q2ZmZmTGXJUnTZdKhcHeSYwC65Z6ufTdw7FC/NcCdE65NkqbepENhB7Cp+7wJuHyofWOSQ5IcB6wDrplwbZI09VaO64uTXAw8G1iVZDdwHvAmYHuSs4DbgdMAquqGJNuBG4G9wNlV9b1x1SZJGm1soVBVpy+w6eQF+m8BtoyrHknSwS3q9FGSnYtpkyQ9sB1wpJDkUOAwBqeAjuAHdwk9Cnj0mGuTJE3YwU4f/S7wGgYBcC0/CIWvA+8YX1mSpD4cMBSq6m3A25K8qqrePqGaJEk9WdSF5qp6e5JnAGuH96mqi8ZUlySpB4sKhST/ADwG+Ayw71bRAgwFSXoQWewtqbPA8VU1cuoJSdKDw2KfaL4e+JlxFiJJ6t9iRwqrgBuTXAPct6+xql44lqokSb1YbCi8cZxFSJKWh8XeffTxcRciSerfYu8++gY/eL/Bw4CHAt+qqkeNqzBJ0uQtdqTwyOH1JBvwdZmS9KCzpPcpVNX7gZPu31IkSX1b7OmjFw2tPoTBcws+syBJDzKLvfvoBUOf9wK3Aevv92okSb1a7DWFM+/Pgyb5Q+C3GYw2PgecyWCK7n9mML/SbcBLqup/78/jSpIObLEv2VmT5H1J9iS5O8l7k6xZygGTrAb+AJitqscDK4CNwDnAzqpaB+zs1iVJE7TYC83vBnYweK/CauADXdtSrQR+KslKBiOEOxmcjtrWbd8GbPgJvl+StASLDYWZqnp3Ve3t/lwIzCzlgFX1JeAtwO3AXcDXquojwNFVdVfX5y7gqFH7J9mcZC7J3Pz8/FJKkCQtYLGhcE+SM5Ks6P6cAXxlKQfsXuu5HjiOwcjj4d33LUpVba2q2aqanZlZUi5Jkhaw2FB4OfAS4MsM/nf/mwwuDi/FrwO3VtV8VX0XuAx4BnB3kmMAuuWeJX6/JGmJFhsKfwFsqqqZqjqKQUi8cYnHvB14WpLDkgQ4GbiJwTWLTV2fTcDlS/x+SdISLfY5hScM3x5aVfcmefJSDlhVVye5FLiOwTMPnwa2Ao8Atic5i0FwnLaU75ckLd1iQ+EhSY7YFwxJjvwx9v0RVXUecN5+zfcxGDVIknqy2H/Y/xr4z+5/+MXg+sKWsVUlSerFYp9ovijJHINJ8AK8qKpuHGtlkqSJW/QpoC4EDAJJehBb0tTZkqQHJ0NBktQYCpKkxlCQJDWGgiSpMRQkSY2hIElqDAVJUmMoSJIaQ0GS1BgKkqTGUJAkNYaCJKnpJRSSHJ7k0iSfT3JTkqcnOTLJFUlu6ZZH9FGbJE2zvkYKbwM+XFW/ADyRwTuazwF2VtU6YGe3LkmaoImHQpJHAc8C3gVQVd+pqq8C64FtXbdtwIZJ1yZJ066PkcLPA/PAu5N8Osk7kzwcOLqq7gLolkeN2jnJ5iRzSebm5+cnV7UkTYE+QmEl8MvA31fVk4Fv8WOcKqqqrVU1W1WzMzMz46pRkqZSH6GwG9hdVVd365cyCIm7kxwD0C339FCbJE21iYdCVX0ZuCPJ47qmkxm8+3kHsKlr2wRcPunaJGnarezpuK8C3pPkYcAXgDMZBNT2JGcBtwOn9VSbJE2tXkKhqj4DzI7YdPKES5EkDfGJZklSYyhIkhpDQZLUGAqSpMZQkCQ1hoIkqTEUJEmNoSBJagwFSVJjKEiSGkNBktQYCpKkxlCQJDWGgiSpMRQkSY2hIElqeguFJCuSfDrJB7v1I5NckeSWbnlEX7VJ0rTqc6TwauCmofVzgJ1VtQ7Y2a1Lkiaol1BIsgZ4HvDOoeb1wLb
|
||
|
"text/plain": [
|
||
|
"<Figure size 432x288 with 1 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {
|
||
|
"needs_background": "light"
|
||
|
},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"sns.countplot(df['target'])"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"**TASK: Create a pairplot that displays the relationships between the following columns:**\n",
|
||
|
"\n",
|
||
|
" ['age','trestbps', 'chol','thalach','target']\n",
|
||
|
" \n",
|
||
|
"*Note: Running a pairplot on everything can take a very long time due to the number of features*"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 11,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# CODE HERE"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 12,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"Index(['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach',\n",
|
||
|
" 'exang', 'oldpeak', 'slope', 'ca', 'thal', 'target'],\n",
|
||
|
" dtype='object')"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 12,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df.columns"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 13,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"<seaborn.axisgrid.PairGrid at 0x2573c4e2148>"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 13,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
},
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAwQAAALaCAYAAACPuJQJAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy86wFpkAAAACXBIWXMAAAsTAAALEwEAmpwYAAEAAElEQVR4nOydeXxU1d3/3+fOkkzWCdlYkqAgoqgoBa2VVrGgoIBoFdwFW7daa+3zVLHWWqvor9Yu1lqfujyPAooKLkVFRaRqW7QqFFcEAZUQBRJCJuskM3Pv/f1xZp87yWSdLOf9euU1mZm7nGS+59w59/s5n68wTROFQqFQKBQKhUIxNNHS3QCFQqFQKBQKhUKRPtSEQKFQKBQKhUKhGMKoCYFCoVAoFAqFQjGEURMChUKhUCgUCoViCKMmBAqFQqFQKBQKxRBGTQgUCoVCoVAoFIohzICeEMyaNcsE1I/66a2fXkHFrfrp5Z9eQcWt+unln15Bxa366eWfQcOAnhDs378/3U1QKDqNilvFQETFrWIgouJWoUiNAT0hUCgUCoVCoVAoFN1DTQgUCoVCoVAoFIohjD3dDVAoFAMXwzCpbfbhC+g47TYKs51omkh3sxSKHkfFukLROVSfGVioCYFCoegShmGybV8jly/bSFWdl7ICFw9dMoXxpblq0FcMKlSsKxSdQ/WZgYeSDCmSUu/1Y5qDahG9ogepbfaFB3uAqjovly/bSG2zL80tUyh6FhXrCkXnUH1m4KEmBApLVrxTyZQl67hs2UZa/Xq6m6Poh/gCeniwD1FV58UXSDFeDAOa9oFnt3w0jF5opWJQ00cx1O1YVyj6in4yrqo+M/BQEwJFAm9+VsNNz33EKLeLv39azc+f/SjdTVL0Q5x2G2UFrpjXygpcOO22jnc2DKjeAg/PgHuOlI/VW9SkQJE6fRhD3Yp1haKv6EfjquozAw81IVAksPztL8l3Obh93pHMOnI4L3zwNdWNrelulqKfUZjt5KFLpoQH/ZBGtDDb2fHOLTXw5PngqZTPPZXyeUtNL7ZYMajowxjqVqwrFH1FPxpXVZ8ZeKhFxYoY9tR7+fvWauZMHIndpjHj8FJe/ngvT2+q4upph6S7eYp+hKYJxpfm8tzVUzvvIhHwRS5aITyV8nWFIhX6MIa6FesKRV/Rj8ZV1WcGHipDoIhh9ftfY5jw3cNKABjpdnH4iFyeem+3WmCsSEDTBMW5GYwqyKI4NyP1wd7uBHdF7GvuCvm6QpEKfRxDXY51haKv6GfjquozAws1IVDE8NaO/ZQXuCjNywy/dvyYQnbVtlB5oCWNLVMMKrKK4bwnIhcvd4V8nlWc3nYpBg4qhhSKWFSfUHQDJRlShAnoBht31XHC2KKY148YmQ/AWztrGV2YnY6mKQYbmgYlE+Cy12Q62+6UFy1N3aNQpIiKIYUiFtUnFN1ATQgUYT75uoEWn86EEbkxr4/Mz6Qgy8HbO2s5/7iKJHsrhgSGIReo9cTFRtMgp7Rn26cYWvRUDPVkXCsU6SS6T6i4VnQCNSFQhHn3iwMAHDYiL+Z1IQQTRubz1s79mKaJEEoHOCQJWdqFXCxC6eiSCeoioxi4qLhWDEZUXCs6iYoKRZh3vzzA8LxMCrISFyAdMSKP/U0+dtY0paFlin5BP7K0Uyh6DBXXisGIimtFJ1ETAkWYLV/XM6bYeo3AISU5AHxYVd+XTVL0J/qRpZ1C0WOouFYMRlRcKzqJmhAoAKj3+vnK08roYVmW749yu8iwa2pCMJTpZ5Z2CkWPoOJaMRhRca3oJGpCoABg654GACoKrScEmiY4qDCbj75SE4Ihi7K0UwxGVFwrBiMqrhWdRC0qVgDwaWhCMCy5rejBxdm8ua2GgG5gt6m55JBDWdopBiMqrhWDERXXik6iJgQKAD7d00hepp2CLEfSbcYUZfPKx3vZWdPM+OG5SbdTDGKUVahiMKLiWjEYUXGt6ARqqqgAYMueBioKs9q1FD24SGYPlGxIoVAoFAqFYvCgMgQKDMNke3UjJ48vaXe7kfkuHDYRXm+gGESECtgYBpg6mObASjGrAjyDg3R8jqmcU8WXYjAQHcc2J2g28Ht7L6ZVvxlQqAmBgq/rvbT6DUa5Xe1up2mCsoIstu1r7KOWKfqEUAGb1++Eb14Jz18zsArZqAI8g4N0fI6pnFPFl2IwYBXH8+6H9bdCU3XPx7TqNwOOPv9UhBDjhRDvR/00CCGuE0IME0KsE0JsDz4W9HXbhiqf1zQDMKKDCQFAWYGLrXvVhGBQESpgc8z5kckADJxCNqoAz+AgHZ9jKudU8aUYDFjF8eqrYep1vRPTqt8MOPo8Q2Ca5jbgGAAhhA34CngOuBFYb5rmb4QQNwafL+7r9g1FQtWHR+ZndrhteUEW/9y+n7pmHwXZys94UBAqYOMq6JFCNoZhUtvswxfQcdptFGY70bTka1O6tF90KhogpyS27aoAz8CjnUJKXY2pBOIlDIZheU4z4CN89CTtMgM+9je2db9NCkUXCPUJwzDQTTBNs/04TNa/XAWR35ONmV2R/qjCaAOOdEuGpgM7TdPcJYSYB0wLvr4UeAM1IegTPq9pJttpI9+V3GEoRHmwcNm2fY0cP6awt5um6AtCBWy8dfIxehDvZCEbwzDZtq+Ry5dtpKrOS1mBi4cumcL40tx2vyx1ar/2Ut9VG8PtNm1O1NezAUQoDuPiz7Q5uxRTCVjFzbmPw/jZsG1NzDm9ho1Mw5THT9Iur2HjrAc2dK9NCkUXCI2Xf1y3jYUnHMziZz7sMA5NmxNhNb576yK/W431XZX+JOk3qjBa/yXdQq7zgCeCv5eaprkHIPjY/gpXRY/xeU0TI/Iz23UYClERnBB8ptYRDB5CBWzefwLOuK9bhWxqm33hL24AVXVeLl+2kdrm9u8KdWq/ZKnvkxaH2+2ZtxSPlp9yuxX9gCSFlDxafpdiKgGruHnqQsyZd8Scs3buUn76YlXk+BbtMs5dwU9frOp+mxSKLhAaL8+eXB6eDED7cejR8vHMWxrbv+bdDxvuaX+s76r0RxVGG3CkLUMghHACZwA/7+R+VwBXAFRUVHSwtSIVdtY0M64kJ6VtC7IcZGfY1DqCTtKv4zZUwGbuH+XdoEtf7rLLkC+ghy9OIarqvPgCesK2hq6jN9Ug9DbyNCffHjOMJzd91eF+yVLRxrBxGNdsptXQ+M2bB/jxDIOC5HX2FCnQp3GbpJBSS31ryjFliWFAcw34W6zjBhvb5zxHtk2nusVkySs1bN7dwC1z9aTt2q/nsnbL611vk6JX6dfjbQ8QGmfdLkfKfaPFZ3DNS83cfMoqSrIEzbqNTHsmo895BGEaYE+yhjDJeKv72xC6jubdby0lUoXRBhzp/GROA/5jmua+4PN9QogRAMHHaqudTNN80DTNKaZpTikuVjPN7tLcFmBvQ2tKC4oBhBCUubPYrjIEnaLfx22ogE3eCMgvA3e5fN7Jwdtpt1FWEBtLZQUunHZbzGuGrmPs24LjkVOw3zuRjEdP5fYTNM6bPKrd/YBIKjoadwXage3Y75tEzoq53DgFspzqwtNd+jxuQ3EYFX+pxpQlIbnD/86AfR9bxs2WmjbqhJtrX9rP2ct3snl3Q+Lx49oltG60SdHr9PvxtpuE+oTH6085Dp12GzVNfs5evpPvPLCDn6/dgztQjXh0NvzpaNlHqrfIPhNNkvH2S48fY98WeHgG3HOkfIzf36I/K/ov6fx0ziciFwJ4HlgY/H0hsLrPWzQE+WK/dBhKZUFxiJHuTHYGnYkUimgKs508dMmU8EUqpGktjFuArjfVYF95QUwa2rHqQn4+rajd/QDrVPS8++HNu8LHcq9eiNtQBfQGA6nGlCXRcocN9yRI4mrnLuWW1/Zy/dMfcu30cSkfv1ttUii6SSj+ntm0m7vOnphSHMbH7G0zhuNevbBjKZDFeFs7dyl+Q0sYw5WL0MAmLZIhIUQWcApwZdTLvwFWCiF+AFQC89PRtqFG5YEWAEo7NSFw8fq2GuU0NJRIsXCZpgnGl+by3NVT23VfEXqbZRo6y2awYfHJuJw
|
||
|
"text/plain": [
|
||
|
"<Figure size 762.375x720 with 20 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {
|
||
|
"needs_background": "light"
|
||
|
},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"# Running pairplot on everything will take a very long time to render!\n",
|
||
|
"sns.pairplot(df[['age','trestbps', 'chol','thalach','target']],hue='target')"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"**TASK: Create a heatmap that displays the correlation between all the columns.**"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 14,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# CODE HERE"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 15,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"<AxesSubplot:>"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 15,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
},
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAqwAAAHWCAYAAACyk9sKAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy86wFpkAAAACXBIWXMAAAsTAAALEwEAmpwYAAEAAElEQVR4nOydd1gUV9uH71lQEJbeFQVEBRXF3kXEjr0bTdOY9qa96TEaNRpbekwxr72b2E2xYO+9URRsWFA6UpYOO98fswILi4JgwC/nzrVXcOaZOb89z3POPHPmzFlJlmUEAoFAIBAIBILqiqqqBQgEAoFAIBAIBA9DJKwCgUAgEAgEgmqNSFgFAoFAIBAIBNUakbAKBAKBQCAQCKo1ImEVCAQCgUAgEFRrRMIqEAgEAoFAIKjWiIRVIBAIBAKBQFAmJElaKklSnCRJoaXslyRJmi9J0jVJkoIlSWpVGeWKhFUgEAgEAoFAUFaWA30fsr8f0FD3eQVYUBmFioRVIBAIBAKBQFAmZFk+BCQ9xGQwsFJWOAFYS5LkUtFyRcIqEAgEAoFAIKgs6gB3ivw7SretQhhX9ATVGW1Mo2r5u7PtJ71W1RIMoq0hVbUEgyS0y6tqCaVilFE97/lqpFRPXdZXq2WTxPZMQlVLKJXIUQ5VLcEgOTbV05cuR6qnrru9tVUtoVTMblbPVMDhQm5VSyiVg39/VOUXzCeV4xi5XH0V5VH+AxbKsrywHKcwVDcV1lo9o1QgEAgEAoFA8I+jS07Lk6AWJwqoW+TfrsC9ColCTAkQCAQCgUAgeOrQPqH/KoE/gOd1qwV0AFJkWY6u6EnFCKtAIBAIBAKBoExIkrQO8AfsJUmKAqYBNQBkWf4V2A4EAteADGB8ZZQrElaBQCAQCASCp4x8+cnMi35UYijL8jOP2C8Db1SeIgUxJUAgEAgEAoFAUK0RI6wCgUAgEAgETxnair94/1QhElaBQCAQCASCp4xKekHqqUFMCRAIBAKBQCAQVGvECKtAIBAIBALBU0a+/O+aEiBGWAUCgUAgEAgE1RoxwvoIJs+FA8fB1gb+XP7PlPn+2O50bu5BVk4uny/ZRcStuBI2te0tmfVafyzVpkTcimPqwh3k5WuxMDPhswl9cHW0Iic3n5lLd3H9biIAz/RuxRA/H2QZrkUlMGPJLnLy8sus68PR/nRupuiavjyI8NsGdNlZMueVQCzNTAm/HcdnS3eSl6/Ms2ndyJX3R3fD2MiIZE0mr3y9QdEV0JIhXX2QJIkth0NYt/f841Qb3VzdmdahB0aSxG8RwSwIPqW3f4hnY17zbQ9ARm4Ok4/u5nJSPAAv+bRmjFdzZFkm/H4CHx7aQXZ+2evmUfi5uTPVrzsqSWJ9WCi/ntXXNtjLm1dbtwMgPTeXz/bvITxB0TavRx+6e9QnMTODfmtWVJomgK6ebkzu449KUrHhfCiLjp3W21/fzobZg3rT1NmR7/YfY+mJswA4W6r5cnBf7NVmaGVYfy6Elacez2+l8d6z3enk60FWdi4zFxluByN6tmBMn1bUdbKm939+IUWTBUCfjt48178tAJnZuXy5fA9X7zyZn1997dMBtPXzIjszh28+3cT1y6X/oMvrkwfSa2grhrX5vMLldmngxuS+/qhUKjaeC2XRkdMlbCb388evoQdZublM2hrEpWilDp9r35KRrX2QkNhwLoSVJxTffTsiEA97GwAsTU1Izcpm6K9rKqTTz82dqd2UGFsfFsKvZ/R1Dvby5tU2iq/Scx7EvuKreT17K7GfkUG/NSsrpMMQ/x3fnY6tPMjKzmPWzzu5Elkyxob3bcGo/q1wdbYhcMIvpKRlAjB2UBt6d20MgJFKhZurLf1fWkCaLgYfl26u7kztqPRjv0cEs+Bisb6iaD+Wl8OUI/r92GhvpR+LSKp4P9aloRuT+vtjpFKx8Uwoiw+VjLFP+/vj5+VBZm4un24K4vI9pQ4tTE2YMbQXDZ3skGWZKZt3c/FONN4uDkwb1AOTGkbkaWVm/rGXkKjYx9b4gLdf7UH7NvXJzs5lznc7uHq95DmnfDAAr4bO5OXlE34lmq9/CiI/X0vnDg146dkuaGWZ/HyZnxbuJeTS3QpretL82166EiOsj2BIP1j41T9XXqfmHtRzsmbYJ0uZvXwPnzzXw6DdmyO7sjboHMM/WUZqehaD/XwAGD+gPVfuxDF26iqmLdrB+2O7A+BgrWZ0z5Y8//laxny2EpVKond7rzLr6uzjTl0na4ZMWcYXq/YwaVyAQbu3h3dlzZ5zDP1sOakZ2QzpouhS1zLhk7EBvPfzH4yavpKP//cXAJ617RjS1YcX5qzjmRmr6Nq8PnUdrcus6wEqSWJmp168sGsjPTctZZBnYxpa2+nZ3ElLYdRf6+i7eTnzzx9nTpfeADiZqRnftBUDtq6i9+blGEkSA+t7l1vDw7R97t+D8ds202f1cgY28qKBra2+tpRUxmz6ncC1K/np1HFmB/Qq2Lfxcijjt22qND1FdU3tG8DEtVvpv2AFA3y88LTX15WcmcWsnQdYoktUH5CvlZm7+xCBC1Yyeuk6xrbxLXFsRejU3IO6TtaM+HApc5ft4aMXDbeD4Kt3eWveRu7Fp+htvxefwuuz1/PslFUs3XaCTyb0Mnh8RWnr14jabna81Pcb5k/bypvTBpdq27BpHcwtTCulXJUkMTUwgJfXbGXAzyvo7+OFp4N+/fs1dMfN1po+85cx9c89TOuvtNmGjnaMbO3DqEXrGPLrKvwb1cfN1hqA9zZuZ+ivaxj66xqCLl1j9+VrFdb5uX8A47duoc+q5Qxs5F0y9lNTGLNxPYFrVvHTqRPM7lEk9i+FMX7r5gppKI2OLT1wdbFh9FtL+fJ/u/ng5Z4G7YLD7/HOjI1Ex+nH2No/zvDih6t48cNV/Lr2MBcuRVU4WVVJEjM69+LFnRvptVHpxxoY6MdG/7WOfpuX8+O548zpWtiPvejTioFbVtFn03JUqor1YypJYsrAAF5dsZWBP6wgsLmBGGvkjpu9NX2/Xca0rXuYNqjwujCpvz9Hrt5kwPcrGPbTam7EJwHwfp+u/LL/BMN+WsNPe47xfp+uj63xAe3b1Me1tg3jXl7E1z/u4r03DLf33Qcu8dyrixn/xjJMTGowoE9zAM5duMWEN5cz8a0VzPt+Bx++3bfCmv4J8pGfyKe6IhLWR9DWF6wt/rnyurX05O9jlwAIvRGNhZkJdlbmJXU1rse+M1cA+PvoJbq1agCAR21bTl+6DcCtmPu42Ftia2kGgLGRCpOaxhipJExr1iA+Ob3sulp48vfxy4quyBjUtUywN6TLuy57z14F4K/jl/Bv4QlAv3Ze7Dt/jZikNADu60YpPFxsCb0RTVZOHvlamXNXoujeskGZdT2ghYMLN1PvcycthVytlj9vhNPLTf88Z+PukZqTDcC5uHu4mBc61khSYWpsjJEkUcu4BrEZZa+bR+Hr5Myt5GTupCra/roaQa/6+trOxdwjNVvRdj4mGme1umDf6Xt3Sc6q2IXQEM1rO3PrfjJRyYquv8Mi6OHlqWeTlJFJSHRswSj5A+I16VyKUUZS0nNyuZGQhJOFmsrCr5UnO47q2sH10tvBlVvxRCekltgeci2atAylPkOvReNo82QacYeAJuzdpoxOhgffQW1hio19ybJUKomXPujHkq93Vkq5zes4czspmaj7KeTma9keWtJ3Pbw82XZRabMXo2KwNDXBQW1OfXtbLkZFk5WrtLnTN6Po2bhkm+vbtBF/h0RUSKevkzO3UorE/pVwetXX13kuOrpY7BfW35OKfYAubT3ZeVCJsbCr0ViYm2BnXTLGrt6MIya+ZIwVpWcXb3YfCa+wphYOLtwq2o9dD6d3sX7sXLF+zPkJ9WPNXPVjbEdwBAGN9X0X0NiTbeeVGAu+E4OFqQn2FuaYm9SkjXsdNp0JBSA3X0talqJZRsbcpCYAalMT4tIq3td26dCAXfvCALgUEY3a3BRbm5K+PHnmRsHfl69E46Brq5lZuQXba5nWqLAewZOhShNWSZK2SpJ0VpKkMEmSXtFte0mSpCu
|
||
|
"text/plain": [
|
||
|
"<Figure size 864x576 with 2 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {
|
||
|
"needs_background": "light"
|
||
|
},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"plt.figure(figsize=(12,8))\n",
|
||
|
"sns.heatmap(df.corr(),cmap='viridis',annot=True)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"----\n",
|
||
|
"----\n",
|
||
|
"\n",
|
||
|
"# Machine Learning\n",
|
||
|
"\n",
|
||
|
"## Train | Test Split and Scaling\n",
|
||
|
"\n",
|
||
|
"**TASK: Separate the features from the labels into 2 objects, X and y.**"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 16,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# CODE HERE"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 17,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"X = df.drop('target',axis=1)\n",
|
||
|
"y = df['target']"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"**TASK: Perform a train test split on the data, with the test size of 10% and a random_state of 101.**"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 18,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# CODE HERE"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 19,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"from sklearn.model_selection import train_test_split\n",
|
||
|
"from sklearn.preprocessing import StandardScaler"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 20,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=101)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"**TASK: Create a StandardScaler object and normalize the X train and test set feature data. Make sure you only fit to the training data to avoid data leakage (data knowledge leaking from the test set).**"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 21,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# CODE HERE"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 22,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"scaler = StandardScaler()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 23,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"scaled_X_train = scaler.fit_transform(X_train)\n",
|
||
|
"scaled_X_test = scaler.transform(X_test)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## Logistic Regression Model\n",
|
||
|
"\n",
|
||
|
"**TASK: Create a Logistic Regression model and use Cross-Validation to find a well-performing C value for the hyper-parameter search. You have two options here, use *LogisticRegressionCV* OR use a combination of *LogisticRegression* and *GridSearchCV*. The choice is up to you, the solutions use the simpler *LogisticRegressionCV* approach.**"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 24,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# CODE HERE"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 25,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"from sklearn.linear_model import LogisticRegressionCV "
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 26,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# help(LogisticRegressionCV)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 27,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"log_model = LogisticRegressionCV()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 28,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"LogisticRegressionCV()"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 28,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"log_model.fit(scaled_X_train,y_train)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"**TASK: Report back your search's optimal parameters, specifically the C value.** \n",
|
||
|
"\n",
|
||
|
"*Note: You may get a different value than what is shown here depending on how you conducted your search.*"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 29,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# CODE HERE"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 30,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"array([0.04641589])"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 30,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"log_model.C_"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 31,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"{'Cs': 10,\n",
|
||
|
" 'class_weight': None,\n",
|
||
|
" 'cv': None,\n",
|
||
|
" 'dual': False,\n",
|
||
|
" 'fit_intercept': True,\n",
|
||
|
" 'intercept_scaling': 1.0,\n",
|
||
|
" 'l1_ratios': None,\n",
|
||
|
" 'max_iter': 100,\n",
|
||
|
" 'multi_class': 'auto',\n",
|
||
|
" 'n_jobs': None,\n",
|
||
|
" 'penalty': 'l2',\n",
|
||
|
" 'random_state': None,\n",
|
||
|
" 'refit': True,\n",
|
||
|
" 'scoring': None,\n",
|
||
|
" 'solver': 'lbfgs',\n",
|
||
|
" 'tol': 0.0001,\n",
|
||
|
" 'verbose': 0}"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 31,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"log_model.get_params()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Coeffecients\n",
|
||
|
"\n",
|
||
|
"**TASK: Report back the model's coefficients.**"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 32,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"array([[-0.09621199, -0.39460154, 0.53534731, -0.13850191, -0.08830462,\n",
|
||
|
" 0.02487341, 0.08083826, 0.29914053, -0.33438151, -0.352386 ,\n",
|
||
|
" 0.25101033, -0.49735752, -0.37448551]])"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 32,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"log_model.coef_"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"**BONUS TASK: We didn't show this in the lecture notebooks, but you have the skills to do this! Create a visualization of the coefficients by using a barplot of their values. Even more bonus points if you can figure out how to sort the plot! If you get stuck on this, feel free to quickly view the solutions notebook for hints, there are many ways to do this, the solutions use a combination of pandas and seaborn.**"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 33,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"#CODE HERE"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 34,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"coefs = pd.Series(index=X.columns,data=log_model.coef_[0])"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 35,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"coefs = coefs.sort_values()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 36,
|
||
|
"metadata": {
|
||
|
"scrolled": true
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlsAAAFlCAYAAADcXS0xAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy86wFpkAAAACXBIWXMAAAsTAAALEwEAmpwYAAAbBElEQVR4nO3de5xudV0v8M9XNl5RkdjiFsRNhiWVWW5Qk2OQl5diHuCoCYnBKeNgGZl1DNM8njqapXk65QXBeIH3VEQRSDAEb3jhonKREA5qIqRoVmJ2FP2dP9Ya9+Mwe2b2nuc3z57N+/167desZ63frN931u35PL+1Zna11gIAQB93mHUBAAA7MmELAKAjYQsAoCNhCwCgI2ELAKAjYQsAoKN1sy5gMbvvvnvbuHHjrMsAAFjSpZde+rXW2vr587frsLVx48Zccsklsy4DAGBJVfXFhea7jQgA0JGwBQDQkbAFANCRsAUA0JGwBQDQkbAFANCRsAUA0JGwBQDQkbAFANCRsAUA0JGwBQDQkbAFANDRdv0fUQMATPrKX31k1iUkSfY4/sBltzWyBQDQkbAFANCRsAUA0JGwBQDQkbAFANCRsAUA0JGwBQDQkbAFANCRsAUA0JGwBQDQkbAFANCRsAUA0JGwBQDQkbAFANCRsAUA0JGwBQDQkbAFANCRsAUA0JGwBQDQkbAFANCRsAUA0JGwBQDQkbAFANCRsAUA0JGwBQDQkbAFANCRsAUA0JGwBQDQkbAFANCRsAUA0JGwBQDQkbAFANCRsAUA0JGwBQDQkbAFANDRVMJWVT2+qq6pquuq6oRF2u1fVd+rqqdMo18AgO3disNWVe2U5NVJnpBkvyRHVtV+W2j3Z0nOXWmfAABrxTRGtg5Icl1r7frW2neSvC3JoQu0++0kpyf56hT6BABYE6YRtvZM8qWJ1zeM836gqvZMcniSE5daWVUdW1WXVNUlN9988xTKAwCYnWmErVpgXpv3+i+T/EFr7XtLray1dlJrbVNrbdP69eunUB4AwOysm8I6bkhyv4nXeyW5cV6bTUneVlVJsnuSQ6rq1tbau6fQPwDAdmsaYeviJPtW1T5JvpzkiCS/MtmgtbbP3HRVnZrkLEELALg9WHHYaq3dWlXPzvBbhjslOaW1dlVVHTcuX/I5LQCAHdU0RrbSWjsnyTnz5i0Yslprx0yjTwCAtcBfkAcA6EjYAgDoSNgCAOhI2AIA6EjYAgDoSNgCAOhI2AIA6EjYAgDoSNgCAOhI2AIA6EjYAgDoSNgCAOhI2AIA6EjYAgDoSNgCAOhI2AIA6EjYAgDoSNgCAOhI2AIA6EjYAgDoSNgCAOhI2AIA6EjYAgDoaN2sCwAAZuumP79p1iUkSTY8b8OsS+jCyBYAQEfCFgBAR8IWAEBHwhYAQEfCFgBAR8IWAEBHwhYAQEfCFgBAR8IWAEBHwhYAQEfCFgBAR8IWAEBHwhYAQEfrZl0AAOyILnzTzbMuIUly0FHrZ13C7Z6RLQCAjoQtAICOhC0AgI6ELQCAjoQtAICOhC0AgI6ELQCAjoQtAICOhC0AgI6ELQCAjqYStqrq8VV1TVVdV1UnLLD86VV1+fjvoqr6mWn0CwCwvVtx2KqqnZK8OskTkuyX5Miq2m9es88n+YXW2oOT/EmSk1baLwDAWjCNka0DklzXWru+tfadJG9Lcuhkg9baRa21b4wvP55kryn0CwCw3ZtG2NozyZcmXt8wztuSX0/yd1PoFwBgu7duCuuoBea1BRtWHZwhbB24xZVVHZvk2CTZe++9p1AeAMDsTGNk64Yk95t4vVeSG+c3qqoHJ3l9kkNba1/f0spaaye11ja11jatX79+CuUBAMzONMLWxUn2rap9quqOSY5IcuZkg6raO8m7kjyjtfa5KfQJALAmrPg2Ymvt1qp6dpJzk+yU5JTW2lVVddy4/MQkL0ryI0leU1VJcmtrbdNK+wYA2N5N45mttNbOSXLOvHknTkw/M8kzp9EXAMBa4i/IAwB0JGwBAHQkbAEAdCRsAQB0JGwBAHQkbAEAdCRsAQB0JGwBAHQkbAEAdCRsAQB0JGwBAHQkbAEAdCRsAQB0JGwBAHQkbAEAdCRsAQB0JGwBAHQkbAEAdCRsAQB0JGwBAHQkbAEAdCRsAQB0JGwBAHQkbAEAdCRsAQB0JGwBAHQkbAEAdCRsAQB0JGwBAHS0btYFAMByvfKMf5p1CT/w3MPvM+sSWCOMbAEAdCRsAQB0JGwBAHQkbAEAdCRsAQB0JGwBAHQkbAEAdCRsAQB0JGwBAHQkbAEAdCRsAQB0JGwBAHQkbAEAdCRsAQB0JGwBAHS0btYFADBbTz79k7Mu4QdOf/IBsy4Bps7IFgBAR8IWAEBHwhYAQEdTCVtV9fiquqaqrquqExZYXlX1V+Pyy6vq56bRLwDA9m7FYauqdkry6iRPSLJfkiOrar95zZ6QZN/x37FJXrvSfgEA1oJpjGwdkOS61tr1rbXvJHlbkkPntTk0yRva4ONJdq2qDVPoGwBguzaNP/2wZ5IvTby+IcnDltFmzyQ3zV9ZVR2bYfQre++99w8tu/m1b1p5tVOw/llHLdnmhlf92ipUsrS9nn3Kkm0ueP0TV6GSpR38zLOXbHPqaY9bhUqWdszR5y3Z5gXvePwqVLK0lzz1fYsuP+Tdf7hKlSztnMNeuujyJ55+8ipVsrSzn/wbiy5/0jvPWKVKlvbepxy+6PK19OcWnnv4fWZdwrIddNT6WZewbBuet3bGP/Y4/sBZl7DVpjGyVQvMa9vQZpjZ2kmttU2ttU3r16+dAxUAYCHTCFs3JLnfxOu9kty4DW0AAHY40whbFyfZt6r2qao7JjkiyZnz2pyZ5FfH30p8eJJ/ba3d5hYiAMCOZsXPbLXWbq2qZyc5N8lOSU5prV1VVceNy09Mck6SQ5Jcl+Tfk/zXlfYLALAWTOX/RmytnZMhUE3OO3FiuiX5rWn0BQCwlvgL8gAAHQlbAAAdCVsAAB0JWwAAHQlbAAAdCVsAAB0JWwAAHQlbAAAdCVsAAB0JWwAAHQlbAAAdCVsAAB0JWwAAHQlbAAAdCVsAAB0JWwAAHQlbAAAdCVsAAB0JWwAAHQlbAAAdrZt1AQA7ovc+5fBZlwBsJ4xsAQB0ZGQLWDPOfvJvzLoEgK1mZAsAoCNhCwCgI2ELAKAjYQsAoCNhCwCgI2ELAKAjYQsAoCNhCwCgI2ELAKAjYQsAoCNhCwCgI/83ItzOnXPYS2ddAsAOzcgWAEBHwhYAQEfCFgBAR8IWAEBHwhYAQEfCFgBAR8IWAEBHwhYAQEfCFgBAR8IWAEBHwhYAQEfCFgBAR8IWAEBHKwpbVbVbVb2/qq4dv95rgTb3q6oLqurqqrqqqn5nJX0CAKwlKx3ZOiHJ+a21fZOcP76e79Ykv9dae1CShyf5rarab4X9AgCsCSsNW4cmOW2cPi3JYfMbtNZuaq1dNk5/M8nVSfZcYb8AAGvCSsPWHq21m5IhVCW592KNq2pjkp9N8okV9gsAsCasW6pBVf19kvsssOgFW9NRVe2S5PQkz2mt/dsi7Y5NcmyS7L333lvTBQDAdmfJsNVae8yWllXVV6pqQ2vtpqrakOSrW2i3c4ag9ebW2ruW6O+kJCclyaZNm9pS9QEAbM9WehvxzCRHj9NHJ3nP/AZVVUn+JsnVrbVXrrA/AIA1ZaVh62VJHltV1yZ57Pg6VXXfqjpnbPPIJM9I8otV9enx3yEr7BcAYE1Y8jbiYlprX0/y6AXm35jkkHH6I0lqJf0AAKxV/oI8AEBHwhYAQEfCFgBAR8IWAEBHwhYAQEfCFgBAR8IWAEBHwhYAQEfCFgBAR8IWAEBHwhYAQEfCFgBAR8IWAEBHwhYAQEfCFgBAR8IWAEBHwhYAQEfCFgBAR8IWAEBHwhYAQEfCFgBAR8IWAEBHwhYAQEfCFgBAR8IWAEBHwhYAQEfCFgBAR8IWAEBHwhYAQEfCFgBAR8IWAEBHwhYAQEfCFgBAR+tmXQDsqF7y1PfNugQAtgNGtgAAOhK2AAA6ErYAADoStgAAOvKAPGvKMUefN+s
|
||
|
"text/plain": [
|
||
|
"<Figure size 720x432 with 1 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {
|
||
|
"needs_background": "light"
|
||
|
},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"plt.figure(figsize=(10,6))\n",
|
||
|
"sns.barplot(x=coefs.index,y=coefs.values);"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"---------\n",
|
||
|
"\n",
|
||
|
"## Model Performance Evaluation"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"**TASK: Let's now evaluate your model on the remaining 10% of the data, the test set.**\n",
|
||
|
"\n",
|
||
|
"**TASK: Create the following evaluations:**\n",
|
||
|
"* Confusion Matrix Array\n",
|
||
|
"* Confusion Matrix Plot\n",
|
||
|
"* Classification Report"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 53,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# CODE HERE"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 54,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"from sklearn.metrics import confusion_matrix,classification_report,plot_confusion_matrix"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 55,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"y_pred = log_model.predict(scaled_X_test)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 56,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"array([[12, 3],\n",
|
||
|
" [ 2, 14]], dtype=int64)"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 56,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"confusion_matrix(y_test,y_pred)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 57,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# CODE HERE"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 58,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"<sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at 0x2573dba6e08>"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 58,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
},
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAATIAAAEKCAYAAACR79kFAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy86wFpkAAAACXBIWXMAAAsTAAALEwEAmpwYAAAVz0lEQVR4nO3deZxeVX3H8c83M0mGEAKEBCQJS6ABGkAojpGl7ChBUdBiBUHRYhGK4AulFqsCxVprldYVZFgKVogSDKKiLGULWBSSsCUBSkQkYUtCQEgIJjPz6x/3TplMJvPc++RZ7p35vnndV+7yPOf+JvPKj3POPedcRQRmZmU2rNkBmJltLCcyMys9JzIzKz0nMjMrPScyMys9JzIzKz0nMjNrGklXSloqaX4/186RFJLGVSrHiczMmukqYHrfk5K2A94JPJOlECcyM2uaiJgNrOjn0n8AnwMyjdhvrWVQG2v0liNi7MS2ZodhOfzxyU2aHYLlsLrzVdZ0rdbGlHHkoZvGSyu6Mn127iN/WgC80etUR0R0DPQdSe8Dno2Ih6VsoRYqkY2d2MZnZ05rdhiWw83T92x2CJbD/7xw7UaX8dKKLu6/ZftMn23Z9sk3IqI9a9mSRgFfAN6VJ6ZCJTIzK74AuumuV/E7A5OBntrYJGCepGkR8cKGvuREZma5BMHayNa0zF12xKPA1j3Hkp4G2iNi+UDfc2e/meXWnfG/SiTNAO4DdpW0RNIp1cTjGpmZ5RIEXTVa/isiTqhwfccs5TiRmVlu3dlGRTSME5mZ5RJAlxOZmZWda2RmVmoBrC3YEvlOZGaWSxBuWppZyQV0FSuPOZGZWT7JyP5icSIzs5xEFxs177zmnMjMLJeks9+JzMxKLBlH5kRmZiXX7RqZmZWZa2RmVnqB6CrYwjlOZGaWm5uWZlZqgVgTLc0OYx1OZGaWSzIg1k1LMys5d/abWalFiK5wjczMSq7bNTIzK7Oks79YqaNY0ZhZ4bmz38wGhS6PIzOzMvPIfjMbFLr91NLMyiyZNO5EZmYlFoi1nqJkZmUWQeEGxBYrGjMrAdGdcatYknSlpKWS5vc693VJj0t6RNINkraoVI4TmZnlEiQ1sixbBlcB0/ucuw3YIyLeCvwv8PlKhTiRmVluXQzLtFUSEbOBFX3O3RoRnenhb4BJlcpxH5mZ5RKokQsr/g3w40ofciIzs1yS18FlTh3jJM3pddwRER1ZvijpC0AncE2lzzqRmVlOuV7Quzwi2nPfQToZOBo4PCKi0uedyMwsl6C+I/slTQf+ATg4Il7P8h0nMjPLrVYrxEqaARxC0gRdApxP8pRyJHCbJIDfRMRpA5XjRGZmuUSoZjWyiDihn9NX5C3HiczMckk6+z1FycxKzWv2m1nJJZ39XljRzErOy/iYWak1eGR/Jk5kZpabXz5iZqUWAWu7ncjMrMSSpqUTmZmVXK1G9teKE1mNPfzFUbx493BGjg0OvvFVABZ+YxNevGs4w4YHo7brZu9/fp3hYyrOg7UGGz6ii699/z6Gj+impSX49R3bcs1luzQ7rMIp4vCLutYPJU2X9ISkRZLOree9imLSsWt4x6Ur1zk3fr+1HPzTVzn4htcYvUM3iy5ra1J0NpC1a4bxj2fsy5knHcSZJx3I2/Zdxq57vNzssAooaVpm2RqlbneS1AJ8DzgKmAqcIGlqve5XFFu1dzJ883VrW+MP6GRYWvfdYq9OVr9YrP+bWQ/xxurkF9XaGrS0difVD1tPrdbsr5V6Ni2nAYsi4ikAST8CjgEW1vGehbd41ggmHLW22WHYBgwbFnzr6nvZdtIqbrp+B55YsGWzQyqc5KllseZa1rPuNxFY3Ot4SXpuHZJOlTRH0pyVKwb3P/AnL21DrTDx6DXNDsU2oLtbnPmRAzn5vYezy+6vsMNOrzU7pMLpGRCbZWuUeiay/n6K9SrqEdEREe0R0T567PA6htNci386ghfvHs4+X1uF3LIsvFUrh/PI3K14235Lmx1KIRWtaVnPRLYE2K7X8STguTrer7CW3tPK765o4+3fXUnLJs2OxjZkzBZ/YtPRSatgxMgu9p62nMVPj25yVMXT89SySDWyevaRPQBMkTQZeBY4HvhwHe9XCPPO2ZSXHmhlzSvivw/bnF3OWM2iy9roXit++4nkH8UWe3Xx1vMzreBrDTR23J/4zHkPM2xYoGHBvbdP4IFfb9PssAppyAyIjYhOSZ8CbgFagCsjYkG97lcU+3xj1Xrntv8r94mVwdOLxnDWRw9sdhiFFyE6h0oiA4iIXwK/rOc9zKzxijYg1iP7zSyXIo7sdyIzs9ycyMys1LywopkNCo0cI5aFE5mZ5RIBnV5Y0czKzk1LMys195GZ2aAQTmRmVnZF6+wvVo+dmRVeRO0mjUu6UtJSSfN7nRsr6TZJT6Z/VlwUzonMzHISXd3DMm0ZXAVM73PuXOD2iJgC3J4eD8iJzMxyi1CmrXI5MRtY0ef0McDV6f7VwLGVynEfmZnlknOu5ThJc3odd0RER4XvbBMRzwNExPOStq50EycyM8snkn6yjJZHRHsdowHctDSzKtR5qesXJW0LkP5Zcb1xJzIzyyVq29nfn58BJ6f7JwM3VvqCE5mZ5RaRbatE0gzgPmBXSUsknQL8K/BOSU8C70yPB+Q+MjPLrVYj+yPihA1cOjxPOU5kZpZLUtsq1sh+JzIzy82Txs2s9HIMv2gIJzIzyyUQ3V5Y0czKrmAVMicyM8vJnf1mNigUrErmRGZmuZWmRibpOwyQdyPirLpEZGaFFkB3d0kSGTBngGtmNlQFUJYaWURc3ftY0qYRsar+IZlZ0RVtHFnFwSCS9pO0EHgsPd5L0sV1j8zMiisybg2SZVTbN4EjgZcAIuJh4KA6xmRmhZZtmetGPhDI9NQyIhZL6wTVVZ9wzKwUCta0zJLIFkvaHwhJI4CzSJuZZjYEBUTBnlpmaVqeBpwBTASeBfZOj81syFLGrTEq1sgiYjlwYgNiMbOyKFjTMstTy50k/VzSsvSNwDdK2qkRwZlZQZXwqeW1wHXAtsAEYCYwo55BmVmB9QyIzbI1SJZEpoj4r4joTLcfUriKpZk1Uq1ePlIrA821HJvu3inpXOBHJAnsQ8BNDYjNzIqqYE8tB+rsn0uSuHoi/mSvawF8uV5BmVmxqWBtsoHmWk5uZCBmVhIN7sjPItPIfkl7AFOBtp5zEfGDegVlZkXW2I78LComMknnA4eQJLJfAkcB9wJOZGZDVcFqZFmeWh5H8tbfFyLi48BewMi6RmVmxdadcWuQLE3L1RHRLalT0hhgKeABsWZDVQEXVsxSI5sjaQvgMpInmfOA++sZlJkVmyLbVrEc6WxJCyTNlzRDUlvlb60vy1zLv0t3vy/pZmBMRDxSzc3MbJCoQR+ZpIkkq+lMjYjVkq4DjgeuylvWQANi9xnoWkTMy3szM7M+WoFNJK0FRgHPVVvIhlw0wLUADqvmhgN5ZUErv9h9y1oXa3V0y3O/aHYIlsO0I/9Yk3JyDIgdJ6n3i4w6IqIDICKelfQN4BlgNXBrRNxaTTwDDYg9tJoCzWyQC/JMUVoeEe39XZC0JXAMMBl4BZgp6aR0PncuWTr7zczWVZtlfI4Afh8RyyJiLTAL2L+acPymcTPLrUZzLZ8B9pU0iqRpeThVvk/XNTIzy68GNbKI+C1wPcmQrkdJ8lFHNeFkmaIkkqWud4qICyVtD7wlIjyWzGyoqtEUpYg4Hzh/Y8vJUiO7GNgPOCE9fg343sbe2MzKKetg2EYu9ZOlj+wdEbGPpAcBIuLl9LVwZjZUlWhhxR5rJbWQViYljaeh00HNrGiKtrBilqblt4EbgK0lfYVkCZ9/qWtUZlZsBXuLUpa5ltdImkvyaFTAsRHhN42bDVUN7v/KIstTy+2B14Gf9z4XEc/UMzAzK7CyJTKSNyb1vISkjWQ6wRPA7nWMy8wKTAXrJc/StNyz93G6KsYnN/BxM7OGyz1FKSLmSXp7PYIxs5IoW9NS0md6HQ4D9gGW1S0iMyu2Mnb2A5v12u8k6TP7SX3CMbNSKFMiSwfCjo6
|
||
|
"text/plain": [
|
||
|
"<Figure size 432x288 with 2 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {
|
||
|
"needs_background": "light"
|
||
|
},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"plot_confusion_matrix(log_model,scaled_X_test,y_test)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 59,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# CODE HERE"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 60,
|
||
|
"metadata": {
|
||
|
"scrolled": true
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
" precision recall f1-score support\n",
|
||
|
"\n",
|
||
|
" 0 0.86 0.80 0.83 15\n",
|
||
|
" 1 0.82 0.88 0.85 16\n",
|
||
|
"\n",
|
||
|
" accuracy 0.84 31\n",
|
||
|
" macro avg 0.84 0.84 0.84 31\n",
|
||
|
"weighted avg 0.84 0.84 0.84 31\n",
|
||
|
"\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"print(classification_report(y_test,y_pred))"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Performance Curves\n",
|
||
|
"\n",
|
||
|
"**TASK: Create both the precision recall curve and the ROC Curve.**"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 63,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"from sklearn.metrics import plot_precision_recall_curve,plot_roc_curve"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 64,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# CODE HERE"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 65,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"<sklearn.metrics._plot.precision_recall_curve.PrecisionRecallDisplay at 0x2573dc46cc8>"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 65,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
},
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAEGCAYAAAB/+QKOAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy86wFpkAAAACXBIWXMAAAsTAAALEwEAmpwYAAAe8UlEQVR4nO3de3QV9bn/8fdDAoLcZBVEJBpAPCLXiFGhyrUVwUsp2iqIBVGq9GDLUnGBLitydP2kR9tSF5yiFapWJWgrFy2iFVGsRUPQcAtSI0QMF4nckSoGnt8fexM3yYRsYE92Lp/XWntlz8x3Js83l/3Z35k9M+buiIiIlFYn2QWIiEjVpIAQEZFACggREQmkgBARkUAKCBERCZSa7AISqXnz5t6mTZtklyEiUm2sWLHiS3dvEbSsRgVEmzZtyMnJSXYZIiLVhpl9Vt4y7WISEZFACggREQmkgBARkUAKCBERCaSAEBGRQKEFhJnNMrPtZramnOVmZo+bWb6ZrTKz7jHLBprZ+uiyiWHVKCIi5QtzBPE0MPAYywcB50YftwF/BDCzFGB6dHlHYJiZdQyxThERCRDaeRDuvtTM2hyjyWDgWY9cb/x9MzvNzFoBbYB8d98AYGZZ0bZ5YdU6+ZW15G3ZG9bmReQ4Dc5ozY2XnJ3sMmq9ZB6DaA18HjNdGJ1X3vxAZnabmeWYWU5RUVEohYpI5cnbupf5uZuTXYaQ3DOpLWCeH2N+IHd/EngSIDMz84TufjTpmk4nspqIhOCGJ5YluwSJSmZAFAJnxUynAVuAeuXMFxGRSpTMXUwLgBHRTzP1APa4+1ZgOXCumbU1s3rA0GhbERGpRKGNIMxsNtAXaG5mhcAkoC6Au88AFgJXAvnAAWBUdFmxmd0BvA6kALPcfW1YdYqISLAwP8U0rILlDowtZ9lCIgEiIiJJojOpRUQkkAJCREQCKSBERCSQAkJERAIpIEREJJACQkREAikgREQkkAJCREQCKSBERCSQAkJERAIpIEREJJACQkREAikgREQkkAJCREQCKSBERCSQAkJERAIpIEREJJACQkREAikgREQkkAJCREQCKSBERCSQAkJERAIpIEREJJACQkREAikgREQkkAJCREQCKSBERCSQAkJERAIpIEREJJACQkREAikgREQkkAJCREQCKSBERCSQAkJERAIpIEREJJACQkREAoUaEGY20MzWm1m+mU0MWN7MzOaa2SozyzazzjHLCsxstZnlmllOmHWKiEhZqWFt2MxSgOnA5UAhsNzMFrh7Xkyz+4Bcdx9iZh2i7X8Qs7yfu38ZVo0iIlK+MEcQFwP57r7B3Q8CWcDgUm06AosB3P1joI2ZtQyxJhERiVOYAdEa+DxmujA6L9ZK4FoAM7sYSAfSossceMPMVpjZbeV9EzO7zcxyzCynqKgoYcWLiNR2YQaEBczzUtNTgGZmlgv8EvgIKI4uu9TduwODgLFm1jvom7j7k+6e6e6ZLVq0SEzlIiIS3jEIIiOGs2Km04AtsQ3cfS8wCsDMDNgYfeDuW6Jft5vZXCK7rJaGWK+IiMQIcwSxHDjXzNqaWT1gKLAgtoGZnRZdBjAaWOrue82soZk1jrZpCAwA1oRYq4iIlBLaCMLdi83sDuB1IAWY5e5rzWxMdPkM4HzgWTM7BOQBt0ZXbwnMjQwqSAVecPdFYdUqIiJlhbmLCXdfCCwsNW9GzPNlwLkB620AuoVZm4iIHJvOpBYRkUAKCBERCaSAEBGRQAoIEREJpIAQEZFACggREQmkgBARkUAKCBERCRTqiXIiIsn0wgebmJ+7OWHbG5zRmhsvOTth26vqNIIQkRprfu5m8rbuTci28rbuTWjYVAcaQYhIjdaxVRPm3N7zpLdzwxPLElBN9aIRhIiIBNIIQkSqlA827gQS8449b+teOrZqctLbqa00ghCRGqtjqyYMzih9p2OJl0YQIlIlJeK4gZwcjSBERCSQAkJERAIpIEREJJCOQYhIlXJZ++bJLkGiFBAiUqU8N/qSZJcgUdrFJCIigRQQIiISSAEhIiKBFBAiIhJIASEiIoEUECIiEkgBISIigRQQIiISSAEhIiKB4jqT2swuBR4E0qPrGODu3i680kREJJnivdTGTOBOYAVwKLxyRESkqog3IPa4+2uhViIiIlVKvAGxxMweBV4Gvjky090/DKUqERFJungD4sjlFTNj5jnQP7HliIhIVRFXQLh7v7ALERGRqiWuj7maWVMz+52Z5UQfvzWzpnGsN9DM1ptZvplNDFjezMzmmtkqM8s2s87xrisiIuGK9zyIWcA+4ProYy/w52OtYGYpwHRgENARGGZmHUs1uw/IdfeuwAjgD8exroiIhCjegDjH3Se5+4boYzJQ0TkQFwP50fYHgSxgcKk2HYHFAO7+MdDGzFrGua6IiIQo3oD4j5lddmQieuLcfypYpzXwecx0YXRerJXAtdFtXkzkRLy0ONc9UsttR3Z9FRUVxdEVERGJR7yfYvoF8Ez0uIMBO4GbK1jHAuZ5qekpwB/MLBdYDXwEFMe5bmSm+5PAkwCZmZmBbURE5PjF+ymmXKCbmTWJTu+NY7VC4KyY6TRgS6nt7gVGAZiZARujj1MrWldERMJ1zIAws5vc/Tkzu6vUfADc/XfHWH05cK6ZtQU2A0OBG0tt5zTgQPQ4w2hgqbvvNbMK1xURkXBVNIJoGP3a+Hg37O7FZnYH8DqQAsxy97VmNia6fAZwPvCsmR0C8oBbj7Xu8dYgIiIn7pgB4e5PRL9OPpGNu/tCYGGpeTNini8Dzo13XRERqTzxnij3v2bWxMzqmtliM/vSzG4KuzgREUmeeD/mOiB6QPlqIgef/wu4J7SqREQk6eINiLrRr1cCs919Z0j1iIhIFRHveRCvmNnHRE6O+28zawF8HV5ZIiKSbHGNINx9ItATyHT3b4Gv0KUvRERqtIrOg+jv7m+Z2bUx82KbvBxWYSIiklwV7WLqA7wFXBOwzFFAiIjUWBWdBzEp+nVU5ZQjIiJVRbznQfy/6GUxjkw3M7OHQ6tKRESSLt6PuQ5y991HJtx9F5GPvIqISA0Vb0CkmNkpRybMrAFwyjHai4hINRfveRDPAYvN7M9EDk7fAjwTWlUiIpJ08d4P4n/NbBXwQyI383nI3V8PtTIREUmqeEcQAOuAYnd/08xONbPG7r4vrMJERCS54v0U08+BvwJPRGe1BuaFVJOIiFQB8R6kHgtcCuwFcPdPgNPDKkpERJIv3oD4JnpbUADMLJXIwWoREamh4g2Id8zsPqCBmV0OvAS8El5ZIiKSbPEGxASgCFgN3E7kVqD3h1WUiIgkX4WfYjKzOsAqd+8M/Cn8kkREpCqocATh7oeBlWZ2diXUIyIiVUS850G0AtaaWTaRmwUB4O4/CqUqERFJungDYnKoVYiISJVT0R3l6gNjgPZEDlDPdPfiyihMRESSq6JjEM8AmUTCYRDw29ArEhGRKqGiXUwd3b0LgJnNBLLDL0lERKqCikYQ3x55ol1LIiK1S0UjiG5mtjf63IicSb03+tzdvUmo1YmISNIcMyDcPaWyChERkaol3kttiIhILaOAEBGRQAoIEREJpIAQEZFACggREQmkgBARkUAKCBERCRRqQJjZQDNbb2b5ZjYxYHlTM3vFzFaa2VozGxWzrMDMVptZrpnlhFmniIiUFe/lvo+bmaUA04HLgUJguZktcPe8mGZjgTx3v8bMWgDrzex5dz8YXd7P3b8Mq0YRESlfmCOIi4F8d98QfcHPAgaXauNAYzMzoBGwE9A1n0REqoAwA6I18HnMdGF0XqxpwPnAFiKXFB8XvcUpRMLjDTNbYWa3lfdNzOw2M8sxs5yioqLEVS8iUsuFGRAWMM9LTV8B5AJnAhnANDM7cgHAS929O5H7UIw1s95B38Tdn3T3THfPbNGiRUIKFxGRcAOiEDgrZjqNyEgh1ijgZY/IBzYCHQDcfUv063ZgLpFdViIiUknCDIjlwLlm1tbM6gFDgQWl2mwCfgBgZi2B84ANZtbQzBpH5zcEBgBrQqxVRERKCe1TTO5ebGZ3AK8DKcA
|
||
|
"text/plain": [
|
||
|
"<Figure size 432x288 with 1 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {
|
||
|
"needs_background": "light"
|
||
|
},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"plot_precision_recall_curve(log_model,scaled_X_test,y_test)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 66,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# CODE HERE"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 67,
|
||
|
"metadata": {
|
||
|
"scrolled": true
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"<sklearn.metrics._plot.roc_curve.RocCurveDisplay at 0x2573dc484c8>"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 67,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
},
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEGCAYAAABo25JHAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy86wFpkAAAACXBIWXMAAAsTAAALEwEAmpwYAAAjOklEQVR4nO3deXxV5b3v8c+vDEKR6chQJDJoAWUKQlRQEbA9FJRTZxCsWpUX0tbhtJdesFrH9liv1FKOWC5VrlrL4IiUKmoriBUVgsQAQRQVMQgVUYZIUYbf/WOt7LMJO8kOydqbZH3fr9d+Za+1nrXW79lJ9m89a3gec3dERCS+vpHtAEREJLuUCEREYk6JQEQk5pQIRERiTolARCTm6mc7gKpq1aqVd+rUKdthiIjUKitWrPjM3VunWlbrEkGnTp3Iz8/PdhgiIrWKmX1U3jKdGhIRiTklAhGRmFMiEBGJOSUCEZGYUyIQEYm5yBKBmc00s0/NbHU5y83MpprZejMrNLO+UcUiIiLli7JF8DAwrILlw4Eu4Wsc8IcIYxERkXJE9hyBuy8xs04VFDkPeNSDfrDfMLMWZtbO3TdHFZNU36w3N/JswaZshyESS92PbcZt/9GjxrebzWsE7YGPk6aLw3mHMLNxZpZvZvlbt27NSHCS2rMFmyjavDPbYYhIDcrmk8WWYl7KUXLcfQYwAyAvL08j6WRZ93bNmHvtgGyHISI1JJstgmLguKTpHOCTLMUiIhJb2UwE84ErwruH+gM7dH1ARCTzIjs1ZGazgcFAKzMrBm4DGgC4+3TgOeAcYD2wG7gqqlhERKR8Ud41NLqS5Q78JKr9i4hIevRksYhIzCkRiIjEnBKBiEjMKRGIiMScEoGISMwpEYiIxJwSgYhIzCkRiIjEnBKBiEjMKRGIiMScEoGISMwpEYiIxJwSgYhIzCkRiIjEnBKBiEjMKRGIiMScEoGISMwpEYiIxFxkQ1XGwaw3N/JswaZsh5FRRZt30r1ds2yHISI1SC2Cani2YBNFm3dmO4yM6t6uGef1aZ/tMESkBqlFUE3d2zVj7rUDsh2GiMhhU4tARCTmlAhERGJOiUBEJOaUCEREYk6JQEQk5pQIRERiTolARCTmlAhERGJOiUBEJOaUCEREYi7SRGBmw8xsnZmtN7NJKZY3N7O/mNnbZrbGzK6KMh4RETlUZInAzOoB04DhQHdgtJl1L1PsJ0CRu+cCg4HfmlnDqGISEZFDRdkiOBVY7+4fuPvXwBzgvDJlHGhqZgYcDXwO7IswJhERKSPKRNAe+Dhpujicl+x+4CTgE2AVcKO7Hyi7ITMbZ2b5Zpa/devWqOIVEYmlKBOBpZjnZaa/BxQAxwJ9gPvN7JBRT9x9hrvnuXte69atazpOEZFYizIRFAPHJU3nEBz5J7sKeNoD64EPgRMjjElERMqIMhEsB7qYWefwAvClwPwyZTYC3wEws7ZAN+CDCGMSEZEyIhuhzN33mdl1wAtAPWCmu68xs/Hh8unAXcDDZraK4FTSRHf/LKqYRETkUJEOVenuzwHPlZk3Pen9J8DQKGMQEZGK6cliEZGYUyIQEYk5JQIRkZiL9BrBkWTWmxt5tmBTjW6zaPNOurc75LEHEZFaJTYtgmcLNlG0eWeNbrN7u2ac16fsw9IiIrVLbFoEEHxxz712QLbDEBE5osSmRSAiIqkpEYiIxJwSgYhIzCkRiIjEXNqJwMyaRBmIiIhkR6WJwMxON7MiYG04nWtmD0QemYiIZEQ6LYLfEQwgsw3A3d8GzooyKBERyZy0Tg25+8dlZu2PIBYREcmCdB4o+9jMTgc8HGDmBsLTRCIiUvul0yIYD/yEYOD5YoKxhX8cYUwiIpJB6bQIurn7ZckzzOwM4LVoQhIRkUxKp0Xw32nOExGRWqjcFoGZDQBOB1qb2c+SFjUjGINYRETqgIpODTUEjg7LNE2avxO4OMqgREQkc8pNBO7+CvCKmT3s7h9lMCYREcmgdC4W7zaze4EeQKPSme5+dmRRiYhIxqRzsfjPwDtAZ+AOYAOwPMKYREQkg9JJBMe4+0PAXnd/xd2vBvpHHJeIiGRIOqeG9oY/N5vZucAnQE50IYmISCalkwh+ZWbNgf9F8PxAM+A/owxKREQyp9JE4O4Lwrc7gCGQeLJYRETqgIoeKKsHjCToY2ihu682sxHAL4DGwMmZCVFERKJUUYvgIeA4YBkw1cw+AgYAk9x9XgZiExGRDKgoEeQBvd39gJk1Aj4Dvu3uWzITmoiIZEJFt49+7e4HANx9D/BuVZOAmQ0zs3Vmtt7MJpVTZrCZFZjZGjN7pSrbFxGR6quoRXCimRWG7w04IZw2wN29d0UbDq8xTAP+nWAcg+VmNt/di5LKtAAeAIa5+0Yza3P4VRERkcNRUSI4qZrbPhVY7+4fAJjZHOA8oCipzBjgaXffCODun1ZznyIiUkUVdTpX3Y7m2gPJYx0XA6eVKdMVaGBmiwl6OP29uz9adkNmNg4YB9ChQ4dqhiUiIsnSGrz+MFmKeV5muj7QDzgX+B7wSzPreshK7jPcPc/d81q3bl3zkYqIxFg6TxYfrmKC209L5RB0T1G2zGfu/iXwpZktAXKBdyOMS0REkqTVIjCzxmbWrYrbXg50MbPOZtYQuBSYX6bMs8BAM6tvZt8kOHW0tor7ERGRaqg0EZjZfwAFwMJwuo+Zlf1CP4S77wOuA14g+HJ/3N3XmNl4MxsfllkbbreQ4MG1B9199WHWRUREDkM6p4ZuJ7gDaDGAuxeYWad0Nu7uzwHPlZk3vcz0vcC96WxPRERqXjqnhva5+47IIxERkaxIp0Ww2szGAPXMrAtwA7A02rBERCRT0mkRXE8wXvFXwCyC7qj/M8KYREQkg9JpEXRz95uBm6MORkREMi+dFsF9ZvaOmd1lZj0ij0hERDKq0kTg7kOAwcBWYIaZrTKzW6IOTEREMiOtB8rcfYu7TwXGEzxTcGuUQYmISOak80DZSWZ2u5mtBu4nuGMoJ/LIREQkI9K5WPz/gNnAUHcv21eQiIjUcpUmAnfvn4lAREQkO8pNBGb2uLuPNLNVHNx9dFojlImISO1QUYvgxvDniEwEIiIi2VHuxWJ33xy+/bG7f5T8An6cmfBERCRq6dw++u8p5g2v6UBERCQ7KrpG8COCI//jzawwaVFT4LWoAxMRkcyo6BrBLOB54G5gUtL8Xe7+eaRRiYhIxlSUCNzdN5jZT8ouMLN/UzIQEakbKmsRjABWENw+aknLHDg+wrhERCRDyk0E7j4i/Nk5c+GIiEimpdPX0Blm1iR8/wMzu8/MOkQfmoiIZEI6t4/+AdhtZrnA/wY+Av4UaVQiIpIx6Q5e78B5wO/d/fcEt5CKiEgdkE7vo7vM7CbgcmCgmdUDGkQbloiIZEo6LYJRBAPXX+3uW4D2wL2RRiUiIhmTzlCVW4A/A83NbASwx90fjTwyERHJiHTuGhoJLAMuAUYCb5rZxVEHJiIimZHONYKbgVPc/VMAM2sN/A14MsrAREQkM9K5RvCN0iQQ2pbmeiIiUguk0yJYaGYvEIxbDMHF4+eiC0lERDIpnTGLf25mFwJnEvQ3NMPdn4k8MhERyYiKxiPoAkwGTgBWARPcfVOmAhMRkcyo6Fz/TGABcBFBD6T/XdWNm9kwM1tnZuvNbFIF5U4xs/26G0lEJPMqOjXU1N3/GL5fZ2ZvVWXD4RPI0wiGuiwGlpvZfHcvSlHuHuCFqmxfRERqRkWJoJGZncz/jEPQOHna3StLDKcC6939AwAzm0PQX1FRmXLXA08Bp1QxdhERqQEVJYLNwH1J01uSph04u5Jttwc+TpouBk5LLmBm7YELwm2VmwjMbBwwDqBDB/WALSJSkyoamGZINbdtKeZ5mekpwER332+WqngilhnADIC8vLyy2xARkWpI5zmCw1UMHJc0nQN8UqZMHjAnTAKtgHPMbJ+7z4swLhERSRJlIlgOdDGzzsAm4FJgTHKB5GEwzexhYIGSgIhIZkWWCNx9n5ldR3A3UD1gpruvMbPx4fLpUe1bRETSV2kisOC8zWXA8e5+Zzh
|
||
|
"text/plain": [
|
||
|
"<Figure size 432x288 with 1 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {
|
||
|
"needs_background": "light"
|
||
|
},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"plot_roc_curve(log_model,scaled_X_test,y_test)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"**Final Task: A patient with the following features has come into the medical office:**\n",
|
||
|
"\n",
|
||
|
" age 48.0\n",
|
||
|
" sex 0.0\n",
|
||
|
" cp 2.0\n",
|
||
|
" trestbps 130.0\n",
|
||
|
" chol 275.0\n",
|
||
|
" fbs 0.0\n",
|
||
|
" restecg 1.0\n",
|
||
|
" thalach 139.0\n",
|
||
|
" exang 0.0\n",
|
||
|
" oldpeak 0.2\n",
|
||
|
" slope 2.0\n",
|
||
|
" ca 0.0\n",
|
||
|
" thal 2.0"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"**TASK: What does your model predict for this patient? Do they have heart disease? How \"sure\" is your model of this prediction?**\n",
|
||
|
"\n",
|
||
|
"*For convience, we created an array of the features for the patient above*"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 68,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"patient = [[ 54. , 1. , 0. , 122. , 286. , 0. , 0. , 116. , 1. ,\n",
|
||
|
" 3.2, 1. , 2. , 2. ]]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 69,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"age 54.0\n",
|
||
|
"sex 1.0\n",
|
||
|
"cp 0.0\n",
|
||
|
"trestbps 122.0\n",
|
||
|
"chol 286.0\n",
|
||
|
"fbs 0.0\n",
|
||
|
"restecg 0.0\n",
|
||
|
"thalach 116.0\n",
|
||
|
"exang 1.0\n",
|
||
|
"oldpeak 3.2\n",
|
||
|
"slope 1.0\n",
|
||
|
"ca 2.0\n",
|
||
|
"thal 2.0\n",
|
||
|
"Name: 268, dtype: float64"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 69,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"X_test.iloc[-1]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 70,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"0"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 70,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"y_test.iloc[-1]"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 71,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"array([0], dtype=int64)"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 71,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"log_model.predict(patient)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 72,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"array([[9.99999862e-01, 1.38455917e-07]])"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 72,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"log_model.predict_proba(patient)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"----\n",
|
||
|
"\n",
|
||
|
"## Great Job!"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"metadata": {
|
||
|
"anaconda-cloud": {},
|
||
|
"kernelspec": {
|
||
|
"display_name": "Python 3",
|
||
|
"language": "python",
|
||
|
"name": "python3"
|
||
|
},
|
||
|
"language_info": {
|
||
|
"codemirror_mode": {
|
||
|
"name": "ipython",
|
||
|
"version": 3
|
||
|
},
|
||
|
"file_extension": ".py",
|
||
|
"mimetype": "text/x-python",
|
||
|
"name": "python",
|
||
|
"nbconvert_exporter": "python",
|
||
|
"pygments_lexer": "ipython3",
|
||
|
"version": "3.7.6"
|
||
|
}
|
||
|
},
|
||
|
"nbformat": 4,
|
||
|
"nbformat_minor": 1
|
||
|
}
|