You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

793 lines
74 KiB

2 years ago
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"___\n",
"\n",
"<a href='https://www.udemy.com/user/joseportilla/'><img src='../Pierian_Data_Logo.png'/></a>\n",
"___\n",
"<center><em>Copyright by Pierian Data Inc.</em></center>\n",
"<center><em>For more information, visit us at <a href='http://www.pieriandata.com'>www.pieriandata.com</a></em></center>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# KNN Project Exercise - Solutions\n",
"\n",
"Due to the simplicity of KNN for Classification, let's focus on using a PipeLine and a GridSearchCV tool, since these skills can be generalized for any model."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## The Sonar Data \n",
"\n",
"### Detecting a Rock or a Mine\n",
"\n",
"Sonar (sound navigation ranging) is a technique that uses sound propagation (usually underwater, as in submarine navigation) to navigate, communicate with or detect objects on or under the surface of the water, such as other vessels.\n",
"\n",
"<img src=\"sonar.jpg\" style=\"max-height: 500px; max-width: 500px;\">\n",
"\n",
"The data set contains the response metrics for 60 separate sonar frequencies sent out against a known mine field (and known rocks). These frequencies are then labeled with the known object they were beaming the sound at (either a rock or a mine). \n",
"\n",
"<img src=\"mine.jpg\" style=\"max-height: 500px; max-width: 500px;\">\n",
"\n",
"Our main goal is to create a machine learning model capable of detecting the difference between a rock or a mine based on the response of the 60 separate sonar frequencies.\n",
"\n",
"\n",
"Data Source: https://archive.ics.uci.edu/ml/datasets/Connectionist+Bench+(Sonar,+Mines+vs.+Rocks)\n",
"\n",
"### Complete the Tasks in bold\n",
"\n",
"**TASK: Run the cells below to load the data.**"
]
},
{
"cell_type": "code",
"execution_count": 95,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import seaborn as sns\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "code",
"execution_count": 96,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv('../DATA/sonar.all-data.csv')"
]
},
{
"cell_type": "code",
"execution_count": 97,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Freq_1</th>\n",
" <th>Freq_2</th>\n",
" <th>Freq_3</th>\n",
" <th>Freq_4</th>\n",
" <th>Freq_5</th>\n",
" <th>Freq_6</th>\n",
" <th>Freq_7</th>\n",
" <th>Freq_8</th>\n",
" <th>Freq_9</th>\n",
" <th>Freq_10</th>\n",
" <th>...</th>\n",
" <th>Freq_52</th>\n",
" <th>Freq_53</th>\n",
" <th>Freq_54</th>\n",
" <th>Freq_55</th>\n",
" <th>Freq_56</th>\n",
" <th>Freq_57</th>\n",
" <th>Freq_58</th>\n",
" <th>Freq_59</th>\n",
" <th>Freq_60</th>\n",
" <th>Label</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0.0200</td>\n",
" <td>0.0371</td>\n",
" <td>0.0428</td>\n",
" <td>0.0207</td>\n",
" <td>0.0954</td>\n",
" <td>0.0986</td>\n",
" <td>0.1539</td>\n",
" <td>0.1601</td>\n",
" <td>0.3109</td>\n",
" <td>0.2111</td>\n",
" <td>...</td>\n",
" <td>0.0027</td>\n",
" <td>0.0065</td>\n",
" <td>0.0159</td>\n",
" <td>0.0072</td>\n",
" <td>0.0167</td>\n",
" <td>0.0180</td>\n",
" <td>0.0084</td>\n",
" <td>0.0090</td>\n",
" <td>0.0032</td>\n",
" <td>R</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.0453</td>\n",
" <td>0.0523</td>\n",
" <td>0.0843</td>\n",
" <td>0.0689</td>\n",
" <td>0.1183</td>\n",
" <td>0.2583</td>\n",
" <td>0.2156</td>\n",
" <td>0.3481</td>\n",
" <td>0.3337</td>\n",
" <td>0.2872</td>\n",
" <td>...</td>\n",
" <td>0.0084</td>\n",
" <td>0.0089</td>\n",
" <td>0.0048</td>\n",
" <td>0.0094</td>\n",
" <td>0.0191</td>\n",
" <td>0.0140</td>\n",
" <td>0.0049</td>\n",
" <td>0.0052</td>\n",
" <td>0.0044</td>\n",
" <td>R</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.0262</td>\n",
" <td>0.0582</td>\n",
" <td>0.1099</td>\n",
" <td>0.1083</td>\n",
" <td>0.0974</td>\n",
" <td>0.2280</td>\n",
" <td>0.2431</td>\n",
" <td>0.3771</td>\n",
" <td>0.5598</td>\n",
" <td>0.6194</td>\n",
" <td>...</td>\n",
" <td>0.0232</td>\n",
" <td>0.0166</td>\n",
" <td>0.0095</td>\n",
" <td>0.0180</td>\n",
" <td>0.0244</td>\n",
" <td>0.0316</td>\n",
" <td>0.0164</td>\n",
" <td>0.0095</td>\n",
" <td>0.0078</td>\n",
" <td>R</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0.0100</td>\n",
" <td>0.0171</td>\n",
" <td>0.0623</td>\n",
" <td>0.0205</td>\n",
" <td>0.0205</td>\n",
" <td>0.0368</td>\n",
" <td>0.1098</td>\n",
" <td>0.1276</td>\n",
" <td>0.0598</td>\n",
" <td>0.1264</td>\n",
" <td>...</td>\n",
" <td>0.0121</td>\n",
" <td>0.0036</td>\n",
" <td>0.0150</td>\n",
" <td>0.0085</td>\n",
" <td>0.0073</td>\n",
" <td>0.0050</td>\n",
" <td>0.0044</td>\n",
" <td>0.0040</td>\n",
" <td>0.0117</td>\n",
" <td>R</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0.0762</td>\n",
" <td>0.0666</td>\n",
" <td>0.0481</td>\n",
" <td>0.0394</td>\n",
" <td>0.0590</td>\n",
" <td>0.0649</td>\n",
" <td>0.1209</td>\n",
" <td>0.2467</td>\n",
" <td>0.3564</td>\n",
" <td>0.4459</td>\n",
" <td>...</td>\n",
" <td>0.0031</td>\n",
" <td>0.0054</td>\n",
" <td>0.0105</td>\n",
" <td>0.0110</td>\n",
" <td>0.0015</td>\n",
" <td>0.0072</td>\n",
" <td>0.0048</td>\n",
" <td>0.0107</td>\n",
" <td>0.0094</td>\n",
" <td>R</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 61 columns</p>\n",
"</div>"
],
"text/plain": [
" Freq_1 Freq_2 Freq_3 Freq_4 Freq_5 Freq_6 Freq_7 Freq_8 Freq_9 \\\n",
"0 0.0200 0.0371 0.0428 0.0207 0.0954 0.0986 0.1539 0.1601 0.3109 \n",
"1 0.0453 0.0523 0.0843 0.0689 0.1183 0.2583 0.2156 0.3481 0.3337 \n",
"2 0.0262 0.0582 0.1099 0.1083 0.0974 0.2280 0.2431 0.3771 0.5598 \n",
"3 0.0100 0.0171 0.0623 0.0205 0.0205 0.0368 0.1098 0.1276 0.0598 \n",
"4 0.0762 0.0666 0.0481 0.0394 0.0590 0.0649 0.1209 0.2467 0.3564 \n",
"\n",
" Freq_10 ... Freq_52 Freq_53 Freq_54 Freq_55 Freq_56 Freq_57 \\\n",
"0 0.2111 ... 0.0027 0.0065 0.0159 0.0072 0.0167 0.0180 \n",
"1 0.2872 ... 0.0084 0.0089 0.0048 0.0094 0.0191 0.0140 \n",
"2 0.6194 ... 0.0232 0.0166 0.0095 0.0180 0.0244 0.0316 \n",
"3 0.1264 ... 0.0121 0.0036 0.0150 0.0085 0.0073 0.0050 \n",
"4 0.4459 ... 0.0031 0.0054 0.0105 0.0110 0.0015 0.0072 \n",
"\n",
" Freq_58 Freq_59 Freq_60 Label \n",
"0 0.0084 0.0090 0.0032 R \n",
"1 0.0049 0.0052 0.0044 R \n",
"2 0.0164 0.0095 0.0078 R \n",
"3 0.0044 0.0040 0.0117 R \n",
"4 0.0048 0.0107 0.0094 R \n",
"\n",
"[5 rows x 61 columns]"
]
},
"execution_count": 97,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data Exploration\n",
"\n",
"**TASK: Create a heatmap of the correlation between the difference frequency responses.**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# CODE HERE"
]
},
{
"cell_type": "code",
"execution_count": 98,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:>"
]
},
"execution_count": 98,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAecAAAGGCAYAAABBpUmrAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy86wFpkAAAACXBIWXMAAAsTAAALEwEAmpwYAABq1ElEQVR4nO29eZxsdXXu/ayah+7qPiMc4MA5RDgKgsioUTHRyFUTBc11jiZ6r4TXkGu80RsN8Ur0NfHGzIlGUREkCPqCKCpEEcUhXJQZGWSSIxw4cMaea671/rF3H4quZ3V3dVV376pa389nf06fVat++7d/e6q997OfJaoKx3Ecx3GiQ2y1O+A4juM4zjPxk7PjOI7jRAw/OTuO4zhOxPCTs+M4juNEDD85O47jOE7E8JOz4ziO40QMPzk7juM4joGIXCgiu0TkbuNzEZF/FpGHROQuETmxG/P1k7PjOI7j2FwE4JXzfP4qAEeF09kA/q0bM/WTs+M4juMYqOqPAOybJ+VMAF/SgJsAjIrIpk7n6ydnx3Ecx1k6hwJ4rOn/O8JYRyQ6+bKI1AH8vCl0lqpu76hHfD5vAHA+gOcAOFVVb5kv/9vJbdST9JT3n0bzh497Dp/v2g00PnPINhofyx9C44lGpSVWmH6K5qoIjdcSGRqP11vbBoA9+SNo3CKtRRrPVKdofDo1SuP7amt5f4p5Gk/EuH3s+uwEjZfqKRp/cNcIje/c3aDxqakaje/by8dhZrJM4xaHHDHaEksl+W/hwzbx3TCV5GOTiLfVFYxk+bIW0nyZasr7mY5XaXxNkq8riz3lNTSeivN+jiR4+wq+rxQbfF/ZoE/y+VZnaHwiy/f/qvJtMKOt7cSVL1ND+EocmdpJ45XUEI2L1mkcxnFkLH0QjY+W+fEoVTLGPsa32UaML9fa41/CO9QFrOP9Yvmd2gN/iOB29CwXqOoFbTTBlq1jX+yOTs4Aiqp6AvtARASAqCo/OrbH3QBeD+CzXWjLcRzH6RMk2dl5X6t6AYB2TsZz2QFgc9P/DwPwREedQpdva4vIFhG5T0Q+DeA2AJtF5AMicnOoYvvLptzzROR+EfmeiFwmIu+32lXV+1T1/gXmfbaI3CIit/xHY6xry+Q4juM483A1gHeEqu0XABhXVX4LpA06vXLOisgd4d+PAHgfgG0A3qmq7xGRMxAo2E5FcOl/tYicDmAawJsBPD/sw20Abu2kI+FtiAuAzm9zOI7jOL1BLLFsd8wBACJyGYDfALBeRHYA+AiAJACo6mcAXAPg1QAeAjAD4J3dmG9Xb2uLyBYAvwoVawBwRjjdHv5/CMHJehjAVarBQxoRubrDfjwD69nyzX/7Uxo//t3TNL7upGNpPF3gz1VHjOctpdRwS2w6tw6ZyiTNb5DnOeVEjuYm4vz5Vx28L1nlyxpv8OeJYpQUZc/RASAe48/AYsb+U2vwD6arWRqvNvhy1YxHb7Ua738y1d5No0qFPztMZ5I03qi3zrdUr9NHgQ9vr+OQTenWPhoHnRppGwAmp3l+OsGXdU2WD1pa+LaQjvN1HgN/clVVPjYW+Th/9ptp8G12SrjOoNbghzUxHgOy/Q0AEsY+kdISjVv6jKk0P14Mlfa2tj25m+Y2Rvh2n6zwsbG0K+lk67EIACoJvr8l4lyDYRFrGDviMiKGlqNbqOpbFvhcAfxRt+fb6cmZ0by1CIC/VtVnPCsWkT9BFx6Y9zLtnJid/sA4XtITs9MftHNidpbGcl85rxbL/SrVdwC8S0SGAEBEDhWRjQB+BOB1IpIVkWEAr1nmfjiO4zh9iCSloymqLOtlmqp+V0SeA+D/BuJtTAH4PVW9TUS+AuAOAL8C8OP52hGR1wH4FwAbAHxbRO5Q1f+ynH13HMdxnNWio5Ozqg7N+f92AM+dE/snAP9EvvtxAB8HABE5f4H5XAXgqk766jiO4/Qf/Xpb2x9wOo7jOD1LlG9Nd0IkTs6qej4AiMinALxozsf/pKpfbKc9y/HLUmXf9TlabATHG+1vKHCVqAxxR6Hx2LqWWDnL1deWo1BFuOtRFlxYFoehmjYcheINPt+aoQavxXg8C65kHU1z5a6lvhbhesG68h0xn+GK4U0beft1wxqnVuXrRQy5eSrF269UWse5XDYU32neRtwQB6oxBhNTfKFixlsEiTh3bcsk+DaST/H+lGJc0FZrcEmLtc4tqnHe/niZu2aVanxb2284fjXivJ+WujsX48eRcrpV8fx4iTtybc7wdZjJcKGYGuvQEpDO5NbTuKVAHzKcyappru7OTHC3tXqG5y8nfuVM6LZ9p6pSObqIfAyBuXgDwC4Af6CqHTuwOI7jOL2NxPvz5NypWruoqic0TdtnPwjdUrqlBv+kqh4fvlP9LQD/u0vtOo7jOE7k6BX7zmb39TzIO9LN9p1f+P7PurlYjuM4TkSJxaWjKar0jH2niHwcwDsAjAP4zbmfN9t3Fr/81wNtcOI4jjMoWHqQXqdn7DtV9TwA54nIhwCci8DflGKVerTsOC3hlyUUO2UNFz0kjz6ZxnMgtn7GzwerDF4cXExkUTHK2ln3SiyhiGUDaAnLEob143CSWxta4iaLdJwL4yo1QzBjtD82aSxXm/eSyiW+XnK51vHPZpKYnmot01gq8bHcu5/3MWEIYCwHsnibJSYt0V3REFrV2hy0St2wuW1w4ZcVn6rw+GSZ9zMeK9C4tc+Va/zwOEqEXwAQk1ZBnrV9zygX42nc6HuVCy1rKS5gVOOJolmSNsX7I23acUqNW7wuJ2II+nqd5VgqZt85+0z6War6hfCzpV7dfhnA73bUQ8dZBdiJ2XEch9ET9p0iclTTf18L4BfL1WHHcRynd/BnzkugW/adAD4hItsQvEr1KwDnLF+vHcdxnF7BnzkTVtC+029jO47jOC1E+eq3EyLhEOY4juM4S6FfTUgicXLutn3nzCHbaDxd4LVVLTtOS5V989/8Xxr/zaO20vjwya9oie3OHk5za2oUileun1NDuZtSrposVLk9YKY0TuPVJFemppWrry1qCa6yTlW5FWIpxZW18cQaGo8NcevKbJKrUNMJrmbPGDaja9cYNolFvl6mp1tV3OUyVykXi1zhnjSKyMcMdXS1yvsyNW0dvPgyWeJXKz6U5WM/kuXbYDzG84s1rr5Oxgwr2hhf3myyPYWxRcOQrBbrfBthdp/GboupGt+vxocPpfFMle9v+9KbaLxhyIksS9KZnLG/GW+JZFP82FhJ8OVaeVPP3qdX7DvPB/BuALvD0J+r6jVLnY/jOI7TH0i77z/2CF19z7kZCRRgoqpGiYG2+QdV/dsuteU4juP0Af0qCOsJ+85FzvuAfefFVyzoaeI4juP0Af4qFWfF7DsBnCsi7wBwC4A/VdX9zR8223fuu+vHbt/pOI4zAPTrlXOv2Hf+G4CPIXAV+xiAvwPwLit5LH8IjY8YNVGtOsyWHacl/PrBu79M4y/48OMtsQ0A8i96cUt86qCjWmIAUDWEFqnaDI1btnupmf00btWLFaPOsxp1ZIuZUR5PcknIVJLn141Ns6pGXWhDpGNZUaYT/GlLKsFvJqXaLOg+lG8VnBWGDbGfcXDZuNay6eS/PXft430ZLfB2cun2njiVKnxsZspGbeIE36byaS6As+o8z1QN21CjXnTCEJxZTBt2n+a2Y1xtsUefUxXe9vrsBI03wMdgZ3ILjcNY1EqDzzcV42PfMCxMMzHualdP8m25rFzU57TPcqi1mX3nZ5sTRORP0IZ9p6o+1fTdzyEoG9nTsBOz4ziO0x79KgjrFfvO5vcFXgeAV6RwHMdxBgqJSUdTVOkV+86/EZETEFxtbwfwh8vVZ8dxHKd3iLKoqxN6xb7z7Z3003Ecx3F6iUg4hDmO4zjOUojyrelOiIRD2EL2nQBeieAVLQAYBTBmmZ8AQKLBbQNLhuXceGwdjefALfOYHSfAVdkAcNPHftgSe/FfcVXj0Ik0jHp2iMYbCd7O+OgRNB6rc7VmosyXNV7havBamvennODF36vgdpmWnaBlP9hQHp+s8nGwlLgW9Qbf0SuGNebklGFdWWh
"text/plain": [
"<Figure size 576x432 with 2 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(8,6))\n",
"sns.heatmap(df.corr(),cmap='coolwarm')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**TASK: What are the top 5 correlated frequencies with the target\\label?**\n",
"\n",
"*Note: You many need to map the label to 0s and 1s.*\n",
"\n",
"*Additional Note: We're looking for **absolute** correlation values.*"
]
},
{
"cell_type": "code",
"execution_count": 99,
"metadata": {},
"outputs": [],
"source": [
"#CODE HERE"
]
},
{
"cell_type": "code",
"execution_count": 100,
"metadata": {},
"outputs": [],
"source": [
"df['Target'] = df['Label'].map({'R':0,'M':1})"
]
},
{
"cell_type": "code",
"execution_count": 101,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Freq_45 0.339406\n",
"Freq_10 0.341142\n",
"Freq_49 0.351312\n",
"Freq_12 0.392245\n",
"Freq_11 0.432855\n",
"Target 1.000000\n",
"Name: Target, dtype: float64"
]
},
"execution_count": 101,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.abs(df.corr()['Target']).sort_values().tail(6)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train | Test Split\n",
"\n",
"Our approach here will be one of using Cross Validation on 90% of the dataset, and then judging our results on a final test set of 10% to evaluate our model.\n",
"\n",
"**TASK: Split the data into features and labels, and then split into a training set and test set, with 90% for Cross-Validation training, and 10% for a final test set.**"
]
},
{
"cell_type": "code",
"execution_count": 102,
"metadata": {},
"outputs": [],
"source": [
"# CODE HERE"
]
},
{
"cell_type": "code",
"execution_count": 103,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.model_selection import train_test_split "
]
},
{
"cell_type": "code",
"execution_count": 104,
"metadata": {},
"outputs": [],
"source": [
"X = df.drop(['Target','Label'],axis=1)\n",
"y = df['Label']"
]
},
{
"cell_type": "code",
"execution_count": 105,
"metadata": {},
"outputs": [],
"source": [
"X_cv, X_test, y_cv, y_test = train_test_split(X, y, test_size=0.1, random_state=42)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**TASK: Create a PipeLine that contains both a StandardScaler and a KNN model**"
]
},
{
"cell_type": "code",
"execution_count": 106,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.preprocessing import StandardScaler\n",
"from sklearn.neighbors import KNeighborsClassifier"
]
},
{
"cell_type": "code",
"execution_count": 107,
"metadata": {},
"outputs": [],
"source": [
"scaler = StandardScaler()"
]
},
{
"cell_type": "code",
"execution_count": 108,
"metadata": {},
"outputs": [],
"source": [
"knn = KNeighborsClassifier()"
]
},
{
"cell_type": "code",
"execution_count": 109,
"metadata": {},
"outputs": [],
"source": [
"operations = [('scaler',scaler),('knn',knn)]"
]
},
{
"cell_type": "code",
"execution_count": 110,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.pipeline import Pipeline"
]
},
{
"cell_type": "code",
"execution_count": 111,
"metadata": {},
"outputs": [],
"source": [
"pipe = Pipeline(operations)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**TASK: Perform a grid-search with the pipeline to test various values of k and report back the best performing parameters.**"
]
},
{
"cell_type": "code",
"execution_count": 112,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.model_selection import GridSearchCV"
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {},
"outputs": [],
"source": [
"k_values = list(range(1,30))"
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {},
"outputs": [],
"source": [
"param_grid = {'knn__n_neighbors': k_values}"
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {},
"outputs": [],
"source": [
"full_cv_classifier = GridSearchCV(pipe,param_grid,cv=5,scoring='accuracy')"
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"GridSearchCV(cv=5,\n",
" estimator=Pipeline(steps=[('scaler', StandardScaler()),\n",
" ('knn', KNeighborsClassifier())]),\n",
" param_grid={'knn__n_neighbors': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,\n",
" 12, 13, 14, 15, 16, 17, 18, 19,\n",
" 20, 21, 22, 23, 24, 25, 26, 27,\n",
" 28, 29]},\n",
" scoring='accuracy')"
]
},
"execution_count": 82,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"full_cv_classifier.fit(X_cv,y_cv)"
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'memory': None,\n",
" 'steps': [('scaler', StandardScaler()),\n",
" ('knn', KNeighborsClassifier(n_neighbors=1))],\n",
" 'verbose': False,\n",
" 'scaler': StandardScaler(),\n",
" 'knn': KNeighborsClassifier(n_neighbors=1),\n",
" 'scaler__copy': True,\n",
" 'scaler__with_mean': True,\n",
" 'scaler__with_std': True,\n",
" 'knn__algorithm': 'auto',\n",
" 'knn__leaf_size': 30,\n",
" 'knn__metric': 'minkowski',\n",
" 'knn__metric_params': None,\n",
" 'knn__n_jobs': None,\n",
" 'knn__n_neighbors': 1,\n",
" 'knn__p': 2,\n",
" 'knn__weights': 'uniform'}"
]
},
"execution_count": 83,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"full_cv_classifier.best_estimator_.get_params()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**(HARD) TASK: Using the .cv_results_ dictionary, see if you can create a plot of the mean test scores per K value.**"
]
},
{
"cell_type": "code",
"execution_count": 113,
"metadata": {},
"outputs": [],
"source": [
"#CODE HERE"
]
},
{
"cell_type": "code",
"execution_count": 114,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0.84537696, 0.78065434, 0.77524893, 0.75917496, 0.75931721,\n",
" 0.74822191, 0.75945946, 0.71664296, 0.7113798 , 0.68421053,\n",
" 0.70042674, 0.68435277, 0.68449502, 0.67908962, 0.69530583,\n",
" 0.68990043, 0.7113798 , 0.70042674, 0.72204836, 0.67908962,\n",
" 0.70071124, 0.69530583, 0.69530583, 0.68463727, 0.68477952,\n",
" 0.67923186, 0.67411095, 0.65775249, 0.6685633 ])"
]
},
"execution_count": 114,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"full_cv_classifier.cv_results_['mean_test_score']"
]
},
{
"cell_type": "code",
"execution_count": 115,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Text(0, 0.5, 'Accuracy')"
]
},
"execution_count": 115,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZAAAAEGCAYAAABLgMOSAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy86wFpkAAAACXBIWXMAAAsTAAALEwEAmpwYAAA0YElEQVR4nO3de3xU1bn4/88zkwQCgYRAwiVcEgmiKAIaQi1VUaqARwVprdJWrbW1tHpe9pxTj2jP6bH1e35QqdqLWoqK2mr1aAWkLRUtongFwkXC3XBPghAIIRAScnt+f8wOTiYzyUySyWQyz/v1yiuz1157z9ozmoe911rPElXFGGOMCZUr0g0wxhgTnSyAGGOMaRULIMYYY1rFAogxxphWsQBijDGmVeIi3YCO0K9fP83MzIx0M4wxJqqsX7/+qKqmBdofEwEkMzOTvLy8SDfDGGOiiojsb25/WB9hichUEdkpIgUiMsfP/mQR+auIfCoiW0XkDq99+0QkX0Q2iUieV3mqiLwtIp85v/uE8xqMMcb4F7YAIiJu4ElgGjAKmCUio3yq3Q1sU9UxwCTgURFJ8Np/paqOVdUcr7I5wEpVHQGsdLaNMcZ0sHDegeQCBaq6R1WrgVeA6T51FOglIgIkAaVAbQvnnQ684Lx+AZjRbi02xhgTtHAGkAzgoNd2oVPm7QngfKAYyAfuVdV6Z58Cb4nIehG5y+uY/qp6CMD5ne7vzUXkLhHJE5G8kpKStl+NMcaYRsIZQMRPmW/irSnAJmAQMBZ4QkR6O/smqurFeB6B3S0il4fy5qq6UFVzVDUnLS3gIAJjjDGtFM4AUggM8doejOdOw9sdwGL1KAD2AucBqGqx8/sIsATPIzGAwyIyEMD5fSQcjV+6sYiJ894ha87fmTjvHZZuLArH2xhjTNQKZwBZB4wQkSynY/wWYJlPnQPAZAAR6Q+MBPaISE8R6eWU9wSuAbY4xywDbnde3w680d4NX7qxiAcW51NUVokCRWWVPLA434KIMcZ4CVsAUdVa4B5gBbAdeFVVt4rIbBGZ7VR7GPiyiOTjGVF1v6oeBfoDH4jIp8Ba4O+q+qZzzDzgahH5DLja2W5X81fspLKmrlFZZU0d81fsbO+3MsaYqBXWiYSquhxY7lO2wOt1MZ67C9/j9gBjApzzGM5dS7gUl1WGVG6MMbHIcmH5MSglMaRyY4yJRRZA/LhvykgS492NyhLj3dw3ZWSEWmSMMZ1PTOTCCtWMcZ7pKr98cweHTlSRnBjHz2+48Gy5McYYuwMJaMa4DD5+YDKpPROYduFACx7GGOPDAkgLstOS2F1yKtLNMMaYTscCSAuGpydRcMQCiDHG+LIA0oLs9CSOn67h2KkzkW6KMcZ0KhZAWjA8rSeA3YUYY4wPCyAtyE5PAqDA+kGMMaYRCyAtGJScSGK8m91HKiLdFGOM6VQsgLTA5RKGp/e0OxBjjPFhASQI2WlJ7LY+EGOMacQCSBCGpyVRVFZJxZmWVts1xpjYYQEkCA0d6XtKrB/EGGMaWAAJwhcjsU5GuCXGGNN5WAAJwrC+PXG7xEZiGWOMFwsgQUiIczGsbw+bTGiMMV4sgAQpOy3JhvIaY4yXsAYQEZkqIjtFpEBE5vjZnywifxWRT0Vkq4jc4ZQPEZFVIrLdKb/X65iHRKRIRDY5P9eG8xoaDE9PYt/RCmrq6jvi7YwxptMLWwARETfwJDANGAXMEpFRPtXuBrap6hhgEvCoiCQAtcB/qOr5wJeAu32OfVxVxzo/y+kA2WlJ1NYr+4+d7oi3M8aYTi+cdyC5QIGq7lHVauAVYLpPHQV6iYgASUApUKuqh1R1A4CqngS2AxFd0alhJJatDWKMMR7hDCAZwEGv7UKaBoEngPOBYiAfuFdVGz0jEpFMYBywxqv4HhHZLCKLRKSPvzcXkbtEJE9E8kpKStp2JXgeYYFl5TXGmAbhDCDip0x9tqcAm4BBwFjgCRHpffYEIknA68CPVbXcKf49MNypfwh41N+bq+pCVc1R1Zy0tLTWX4UjqVscA5O7W0oTY4xxhDOAFAJDvLYH47nT8HYHsFg9CoC9wHkAIhKPJ3i8pKqLGw5Q1cOqWufcqTyN51FZhxhuI7GMMeascAaQdcAIEclyOsZvAZb51DkATAYQkf7ASGCP0yfyLLBdVR/zPkBEBnpt3ghsCVP7m8hO9yRVVPW9kTLGmNgTF64Tq2qtiNwDrADcwCJV3Sois539C4CHgedFJB/PI6/7VfWoiHwFuBXIF5FNzikfdEZcPSIiY/E8DtsH/CBc1+BreHoSFdV1HDpRxaCUxI56W2OM6ZTCFkAAnD/4y33KFni9Lgau8XPcB/jvQ0FVb23nZgYtO+2LkVgWQIwxsc5moocg20ZiGWPMWRZAQtAvKYHe3eMsgBhjDBZAQiIiZKcnWQAxxhgsgIQsOz3JZqMbYwwWQEKWnZ7E0VPVlJ2ujnRTjDEmoiyAhMhyYhljjIcFkBANT7ORWMYYAxZAQja4Tw8S4lwWQIwxMc8CSIjcLuGcfj0tgBhjYp4FkFbwjMSqiHQzjDEmoiyAtEJ2ehIHj5+mqqYu0k0xxpiIsQDSCtnpSajCHrsLMcbEMAsgrXB2JJYN5TXGxDALIK2Q1a8nLrGhvMaY2GYBpBW6x7sZktrDlrc1xsQ0CyCtlJ1mObGMMbHNAkgrZacnsedoBXX1trytMSY2hTWAiMhUEdkpIgUiMsfP/mQR+auIfCoiW0XkjpaOFZFUEXlbRD5zfvcJ5zUEMjwtieraeg6Wno7E2xtjTMSFLYCIiBt4EpgGjAJmicgon2p3A9tUdQwwCXhURBJaOHYOsFJVRwArne0ON9xWJzTGxLhw3oHkAgWqukdVq4FXgOk+dRToJSICJAGlQG0Lx04HXnBevwDMCOM1BHR2eVvrBzHGxKhwBpAM4KDXdqFT5u0J4HygGMgH7lXV+haO7a+qhwCc3+n+3lxE7hKRPBHJKykpaeu1NJGcGE9ar242EssYE7PCGUDET5lvj/MUYBMwCBgLPCEivYM8tlmqulBVc1Q1Jy0tLZRDg5adlmR3IMaYmBXOAFIIDPHaHoznTsPbHcBi9SgA9gLntXDsYREZCOD8PhKGtgdleLonK6+qjcQyxsSecAaQdcAIEckSkQTgFmCZT50DwGQAEekPjAT2tHDsMuB25/XtwBthvIZmZaclcbKqlpKTZyLVBGOMiZi4cJ1YVWtF5B5gBeAGFqnqVhGZ7exfADwMPC8i+XgeW92vqkcB/B3rnHoe8KqI3IknAN0UrmtoSXZ6L8AzEiu9d/dINcMYYyIibAEEQFWXA8t9yhZ4vS4Grgn2WKf8GM5dS6R5r4/+5ex+EW6NMcZ0LJuJ3gb9e3cjqVuczQUxxsQkCyBtICIMT+tpI7GMMTHJAkgbDU9PsjsQY0xMsgDSRtnpSRwuP0N5VU2km2KMMR3KAkgbZTurE9qMdGNMrLEA0kZfjMSy9dGNMbHFAkgbDU3tQbxbrB/EGBNzLIC0UZzbRWbfnhZAjDExxwJIO8hOt+VtjTGxxwJIO8hOT2L/sQrO1NZFuinGGNNhLIC0g+z0JOoV9h+z5W2NMbHDAkg7OOCsi37N46uZOO8dlm4sinCLjDEm/CyAtNHSjUU8uarg7HZRWSUPLM63IGKM6fIsgLTR/BU7qaqpb1RWWVPH/BU7I9QiY4zpGBZA2qi4rDKkcmOM6SosgLTRoJREv+WJCW4LIsaYLs0CSBvdN2UkifHuRmVul1BVU8ekX73Lw3/bxrFTtuStMabrCWsAEZGpIrJTRApEZI6f/feJyCbnZ4uI1IlIqoiM9CrfJCLlIvJj55iHRKTIa9+14byGlswYl8HcmaPJSElEgIyURB69aQyr//NKpo8ZxHMf7uXyR1bx2Fs7Ka+qYenGIibOe4esOX+3EVvGmKgmqhqeE4u4gV3A1UAhsA6YparbAtS/Hvg3Vb3Kz3mKgAmqul9EHgJOqeqvgm1LTk6O5uXlte5C2qjgyCkef3sXf88/RGK8i5o6pbb+i888Md7N3JmjmTEuIyLtM8aYQERkvarmBNo
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"scores = full_cv_classifier.cv_results_['mean_test_score']\n",
"plt.plot(k_values,scores,'o-')\n",
"plt.xlabel(\"K\")\n",
"plt.ylabel(\"Accuracy\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Final Model Evaluation\n",
"\n",
"**TASK: Using the grid classifier object from the previous step, get a final performance classification report and confusion matrix.**"
]
},
{
"cell_type": "code",
"execution_count": 117,
"metadata": {},
"outputs": [],
"source": [
"#Code Here"
]
},
{
"cell_type": "code",
"execution_count": 119,
"metadata": {},
"outputs": [],
"source": [
"pred = full_cv_classifier.predict(X_test)"
]
},
{
"cell_type": "code",
"execution_count": 120,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.metrics import classification_report,confusion_matrix,accuracy_score"
]
},
{
"cell_type": "code",
"execution_count": 121,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[12, 1],\n",
" [ 1, 7]], dtype=int64)"
]
},
"execution_count": 121,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"confusion_matrix(y_test,pred)"
]
},
{
"cell_type": "code",
"execution_count": 122,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" M 0.92 0.92 0.92 13\n",
" R 0.88 0.88 0.88 8\n",
"\n",
" accuracy 0.90 21\n",
" macro avg 0.90 0.90 0.90 21\n",
"weighted avg 0.90 0.90 0.90 21\n",
"\n"
]
}
],
"source": [
"print(classification_report(y_test,pred))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Great Job!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}