You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

1581 lines
302 KiB

2 years ago
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"___\n",
"\n",
"<a href='https://www.udemy.com/user/joseportilla/'><img src='../Pierian_Data_Logo.png'/></a>\n",
"___\n",
"<center><em>Copyright by Pierian Data Inc.</em></center>\n",
"<center><em>For more information, visit us at <a href='http://www.pieriandata.com'>www.pieriandata.com</a></em></center>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# AdaBoost"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The Data\n",
"\n",
"<img src=\"mushroom.jpg\" width=\"400\" height=\"400\">\n",
"\n",
"### Mushroom Hunting: Edible or Poisonous?\n",
"\n",
"Data Source: https://archive.ics.uci.edu/ml/datasets/Mushroom\n",
"\n",
"\n",
"This data set includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family (pp. 500-525). Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one. The Guide clearly states that there is no simple rule for determining the edibility of a mushroom; no rule like ``leaflets three, let it be'' for Poisonous Oak and Ivy.\n",
"\n",
"\n",
"Attribute Information:\n",
"\n",
"1. cap-shape: bell=b,conical=c,convex=x,flat=f, knobbed=k,sunken=s\n",
"2. cap-surface: fibrous=f,grooves=g,scaly=y,smooth=s\n",
"3. cap-color: brown=n,buff=b,cinnamon=c,gray=g,green=r, pink=p,purple=u,red=e,white=w,yellow=y\n",
"4. bruises?: bruises=t,no=f\n",
"5. odor: almond=a,anise=l,creosote=c,fishy=y,foul=f, musty=m,none=n,pungent=p,spicy=s\n",
"6. gill-attachment: attached=a,descending=d,free=f,notched=n\n",
"7. gill-spacing: close=c,crowded=w,distant=d\n",
"8. gill-size: broad=b,narrow=n\n",
"9. gill-color: black=k,brown=n,buff=b,chocolate=h,gray=g, green=r,orange=o,pink=p,purple=u,red=e, white=w,yellow=y\n",
"10. stalk-shape: enlarging=e,tapering=t\n",
"11. stalk-root: bulbous=b,club=c,cup=u,equal=e, rhizomorphs=z,rooted=r,missing=?\n",
"12. stalk-surface-above-ring: fibrous=f,scaly=y,silky=k,smooth=s\n",
"13. stalk-surface-below-ring: fibrous=f,scaly=y,silky=k,smooth=s\n",
"14. stalk-color-above-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o, pink=p,red=e,white=w,yellow=y\n",
"15. stalk-color-below-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o, pink=p,red=e,white=w,yellow=y\n",
"16. veil-type: partial=p,universal=u\n",
"17. veil-color: brown=n,orange=o,white=w,yellow=y\n",
"18. ring-number: none=n,one=o,two=t\n",
"19. ring-type: cobwebby=c,evanescent=e,flaring=f,large=l, none=n,pendant=p,sheathing=s,zone=z\n",
"20. spore-print-color: black=k,brown=n,buff=b,chocolate=h,green=r, orange=o,purple=u,white=w,yellow=y\n",
"21. population: abundant=a,clustered=c,numerous=n, scattered=s,several=v,solitary=y\n",
"22. habitat: grasses=g,leaves=l,meadows=m,paths=p, urban=u,waste=w,woods=d"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Goal\n",
"\n",
"**THIS IS IMPORTANT, THIS IS NOT OUR TYPICAL PREDICTIVE MODEL!**\n",
"\n",
"Our general goal here is to see if we can harness the power of machine learning and boosting to help create not just a predictive model, but a general guideline for features people should look out for when picking mushrooms."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Imports"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv(\"../DATA/mushrooms.csv\")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>class</th>\n",
" <th>cap-shape</th>\n",
" <th>cap-surface</th>\n",
" <th>cap-color</th>\n",
" <th>bruises</th>\n",
" <th>odor</th>\n",
" <th>gill-attachment</th>\n",
" <th>gill-spacing</th>\n",
" <th>gill-size</th>\n",
" <th>gill-color</th>\n",
" <th>...</th>\n",
" <th>stalk-surface-below-ring</th>\n",
" <th>stalk-color-above-ring</th>\n",
" <th>stalk-color-below-ring</th>\n",
" <th>veil-type</th>\n",
" <th>veil-color</th>\n",
" <th>ring-number</th>\n",
" <th>ring-type</th>\n",
" <th>spore-print-color</th>\n",
" <th>population</th>\n",
" <th>habitat</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>p</td>\n",
" <td>x</td>\n",
" <td>s</td>\n",
" <td>n</td>\n",
" <td>t</td>\n",
" <td>p</td>\n",
" <td>f</td>\n",
" <td>c</td>\n",
" <td>n</td>\n",
" <td>k</td>\n",
" <td>...</td>\n",
" <td>s</td>\n",
" <td>w</td>\n",
" <td>w</td>\n",
" <td>p</td>\n",
" <td>w</td>\n",
" <td>o</td>\n",
" <td>p</td>\n",
" <td>k</td>\n",
" <td>s</td>\n",
" <td>u</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>e</td>\n",
" <td>x</td>\n",
" <td>s</td>\n",
" <td>y</td>\n",
" <td>t</td>\n",
" <td>a</td>\n",
" <td>f</td>\n",
" <td>c</td>\n",
" <td>b</td>\n",
" <td>k</td>\n",
" <td>...</td>\n",
" <td>s</td>\n",
" <td>w</td>\n",
" <td>w</td>\n",
" <td>p</td>\n",
" <td>w</td>\n",
" <td>o</td>\n",
" <td>p</td>\n",
" <td>n</td>\n",
" <td>n</td>\n",
" <td>g</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>e</td>\n",
" <td>b</td>\n",
" <td>s</td>\n",
" <td>w</td>\n",
" <td>t</td>\n",
" <td>l</td>\n",
" <td>f</td>\n",
" <td>c</td>\n",
" <td>b</td>\n",
" <td>n</td>\n",
" <td>...</td>\n",
" <td>s</td>\n",
" <td>w</td>\n",
" <td>w</td>\n",
" <td>p</td>\n",
" <td>w</td>\n",
" <td>o</td>\n",
" <td>p</td>\n",
" <td>n</td>\n",
" <td>n</td>\n",
" <td>m</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>p</td>\n",
" <td>x</td>\n",
" <td>y</td>\n",
" <td>w</td>\n",
" <td>t</td>\n",
" <td>p</td>\n",
" <td>f</td>\n",
" <td>c</td>\n",
" <td>n</td>\n",
" <td>n</td>\n",
" <td>...</td>\n",
" <td>s</td>\n",
" <td>w</td>\n",
" <td>w</td>\n",
" <td>p</td>\n",
" <td>w</td>\n",
" <td>o</td>\n",
" <td>p</td>\n",
" <td>k</td>\n",
" <td>s</td>\n",
" <td>u</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>e</td>\n",
" <td>x</td>\n",
" <td>s</td>\n",
" <td>g</td>\n",
" <td>f</td>\n",
" <td>n</td>\n",
" <td>f</td>\n",
" <td>w</td>\n",
" <td>b</td>\n",
" <td>k</td>\n",
" <td>...</td>\n",
" <td>s</td>\n",
" <td>w</td>\n",
" <td>w</td>\n",
" <td>p</td>\n",
" <td>w</td>\n",
" <td>o</td>\n",
" <td>e</td>\n",
" <td>n</td>\n",
" <td>a</td>\n",
" <td>g</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 23 columns</p>\n",
"</div>"
],
"text/plain": [
" class cap-shape cap-surface cap-color bruises odor gill-attachment \\\n",
"0 p x s n t p f \n",
"1 e x s y t a f \n",
"2 e b s w t l f \n",
"3 p x y w t p f \n",
"4 e x s g f n f \n",
"\n",
" gill-spacing gill-size gill-color ... stalk-surface-below-ring \\\n",
"0 c n k ... s \n",
"1 c b k ... s \n",
"2 c b n ... s \n",
"3 c n n ... s \n",
"4 w b k ... s \n",
"\n",
" stalk-color-above-ring stalk-color-below-ring veil-type veil-color \\\n",
"0 w w p w \n",
"1 w w p w \n",
"2 w w p w \n",
"3 w w p w \n",
"4 w w p w \n",
"\n",
" ring-number ring-type spore-print-color population habitat \n",
"0 o p k s u \n",
"1 o p n n g \n",
"2 o p n n m \n",
"3 o p k s u \n",
"4 o e n a g \n",
"\n",
"[5 rows x 23 columns]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# EDA"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:xlabel='class', ylabel='count'>"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYsAAAEGCAYAAACUzrmNAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAUiUlEQVR4nO3df5Bd9Xnf8ffH4lfS2AHMmhJJiYijNgEnkZ0tkDh/uDABQVuLZIwLtY1KmciZQmvPpKnB0wYbW51kYoca12aqFBnhOiHEjovK0BIVSFxnwo9VLAsEZtiCXaQRsEaATR3Tij79436VXotdncXsuXfFvl8zd/ac5/s95z53RqPPnB/3nlQVkiQdymvG3YAkafEzLCRJnQwLSVInw0KS1MmwkCR1OmLcDfThhBNOqFWrVo27DUk6rGzfvv2bVTUx29irMixWrVrF1NTUuNuQpMNKkm/MNeZpKElSJ8NCktTJsJAkdTIsJEmdDAtJUifDQpLUybCQJHUyLCRJnQwLSVKnV+U3uKVXu/959U+PuwUtQj/6m/f3tm+PLCRJnQwLSVInw0KS1MmwkCR1MiwkSZ0MC0lSp97DIsmyJF9JcmtbPznJPUmmk/xhkqNa/ei2Pt3GVw3t48pWfzjJOX33LEn6XqM4sngf8NDQ+m8D11TVTwDPAJe2+qXAM61+TZtHklOAC4FTgbXAp5MsG0HfkqSm17BIsgL4e8B/aOsBzgQ+36ZsAc5vy+vaOm38rDZ/HXBTVb1QVY8B08BpffYtSfpefR9Z/FvgXwL/t62/Hni2qva39d3A8ra8HHgcoI0/1+b/dX2Wbf5akg1JppJMzczMLPDHkKSlrbewSPL3gaeqantf7zGsqjZV1WRVTU5MTIziLSVpyejzt6HeCrw9yXnAMcDrgE8AxyY5oh09rAD2tPl7gJXA7iRHAD8MPD1UP2B4G0nSCPR2ZFFVV1bViqpaxeAC9Z1V9S7gLuAdbdp64Ja2vLWt08bvrKpq9Qvb3VInA6uBe/vqW5L0UuP41dkPADcl+SjwFeD6Vr8e+GySaWAfg4ChqnYluRl4ENgPXFZVL/bd5M/9xo19v4UOQ9t/5+JxtyCNxUjCoqr+FPjTtvwos9zNVFXfBS6YY/uNwMb+OpQkHYrf4JYkdTIsJEmdDAtJUifDQpLUybCQJHUyLCRJnQwLSVInw0KS1MmwkCR1MiwkSZ0MC0lSJ8NCktTJsJAkdTIsJEmdDAtJUqc+n8F9TJJ7k3w1ya4kH271G5I8lmRHe61p9SS5Nsl0kp1J3jK0r/VJHmmv9XO8pSSpJ30+/OgF4Myqej7JkcCXk/yXNvYbVfX5g+afy+CRqauB04HrgNOTHA9cBUwCBWxPsrWqnumxd0nSkD6fwV1V9XxbPbK96hCbrANubNvdDRyb5CTgHGBbVe1rAbENWNtX35Kkl+r1mkWSZUl2AE8x+A//nja0sZ1quibJ0a22HHh8aPPdrTZXXZI0Ir2GRVW9WFVrgBXAaUneBFwJ/CTwd4DjgQ8sxHsl2ZBkKsnUzMzMQuxSktSM5G6oqnoWuAtYW1V726mmF4DPAKe1aXuAlUObrWi1ueoHv8emqpqsqsmJiYkePoUkLV193g01keTYtvwDwC8BX2vXIUgS4HzggbbJVuDidlfUGcBzVbUXuB04O8lxSY4Dzm41SdKI9Hk31EnAliTLGITSzVV1a5I7k0wAAXYAv9bm3wacB0wD3wEuAaiqfUk+AtzX5l1dVft67FuSdJDewqKqdgJvnqV+5hzzC7hsjrHNwOYFbVCSNG9+g1uS1MmwkCR1MiwkSZ0MC0lSJ8NCktTJsJAkdTIsJEmdDAtJUifDQpLUybCQJHUyLCRJnQwLSVInw0KS1MmwkCR1MiwkSZ0MC0lSpz4fq3pMknuTfDXJriQfbvWTk9yTZDrJHyY5qtWPbuvTbXzV0L6ubPWHk5zTV8+SpNn1eWTxAnBmVf0ssAZY256t/dvANVX1E8AzwKVt/qXAM61+TZtHklOAC4FTgbXAp9ujWiVJI9JbWNTA8231yPYq4Ezg862+BTi/La9r67Txs5Kk1W+qqheq6jEGz+g+ra++JUkv1es1iyTLkuwAngK2Af8DeLaq9rcpu4HlbXk58DhAG38OeP1wfZZtht9rQ5KpJFMzMzM9fBpJWrp6DYuqerGq1gArGBwN/GSP77WpqiaranJiYqKvt5GkJWkkd0NV1bPAXcDPA8cmOaINrQD2tOU9wEqANv7DwNPD9Vm2kSSNQJ93Q00kObYt/wDwS8BDDELjHW3aeuCWtry1rdPG76yqavUL291SJwOrgXv76luS9FJHdE/5vp0EbGl3Lr0GuLmqbk3yIHBTko8CXwGub/OvBz6bZBrYx+AOKKpqV5KbgQeB/cBlVfVij31Lkg7SW1hU1U7gzbPUH2WWu5mq6rvABXPsayOwcaF7lCTNj9/gliR1MiwkSZ0MC0lSJ8NCktTJsJAkdTIsJEmdDAtJUifDQpLUybCQJHUyLCRJnQwLSVInw0KS1MmwkCR1MiwkSZ0MC0lSJ8NCktSpz8eqrkxyV5IHk+xK8r5W/1CSPUl2tNd5Q9tcmWQ6ycNJzhmqr2216SRX9NWzJGl2fT5WdT/w61X1l0leC2xPsq2NXVNVHxuenOQUBo9SPRX4EeC/JflbbfhTDJ7hvRu4L8nWqnqwx94lSUP6fKzqXmBvW/52koeA5YfYZB1wU1W9ADzWnsV94PGr0+1xrCS5qc01LCRpREZyzSLJKgbP476nlS5PsjPJ5iTHtdpy4PGhzXa32lz1g99jQ5KpJFMzMzML/REkaUnrPSyS/BDwBeD9VfUt4DrgjcAaBkceH1+I96mqTVU1WVWTExMTC7FLSVLT5zULkhzJICg+V1V/DFBVTw6N/x5wa1vdA6wc2nxFq3GIuiRpBOZ1ZJHkjvnUDhoPcD3wUFX97lD9pKFpvww80Ja3AhcmOTrJycBq4F7gPmB1kpOTHMXgIvjW+fQtSVoYhzyySHIM8IPACe3aQtrQ6zj0xWqAtwLvAe5PsqPVPghclGQNUMDXgfcCVNWuJDczuHC9H7isql5sfVwO3A4sAzZX1a75f0RJ0ivVdRrqvcD7GdzKup3/HxbfAv7doTasqi8PzR922yG22QhsnKV+26G2kyT165BhUVWfAD6R5J9V1SdH1JMkaZGZ1wXuqvpkkl8AVg1vU1U39tSXJGkRmVdYJPksg9tddwAvtnIBhoUkLQHzvXV2EjilqqrPZiRJi9N8v5T3APA3+2xEkrR4zffI4gTgwST3Ai8cKFbV23vpSpK0qMw3LD7UZxOSpMVtvndD/VnfjUiSFq/53g31bQZ3PwEcBRwJ/K+qel1fjUmSFo/5Hlm89sBy+82ndcAZfTUlSVpcXvZPlNfAfwLO6ZorSXp1mO9pqF8ZWn0Ng+9dfLeXjiRJi85874b6B0PL+xn8Wuy6Be9GkrQozfeaxSV9NyJJWrzm+/CjFUm+mOSp9vpCkhV9NydJWhzme4H7MwyeTvcj7fWfW21OSVYmuSvJg0l2JXlfqx+fZFuSR9rf41o9Sa5NMp1kZ5K3DO1rfZv/SJL1388HlSR9/+YbFhNV9Zmq2t9eNwATHdvsB369qk5hcJvtZUlOAa4A7qiq1cAdbR3gXAaPUl0NbACug0G4AFcBpwOnAVcdCBhJ0mjMNyyeTvLuJMva693A04faoKr2VtVftuVvAw8xeBTrOmBLm7YFOL8trwNubLfm3g0c257XfQ6wrar2VdUzwDZg7fw/oiTplZpvWPwT4J3AE8Be4B3AP57vmyRZBbwZuAc4sar2tqEngBPb8nLg8aHNdrfaXPWD32NDkqkkUzMzM/NtTZI0D/MNi6uB9VU1UVVvYBAeH57Phkl+CPgC8P6q+tbwWHs+xoI8I6OqNlXVZFVNTkx0nSGTJL0c8w2Ln2mngACoqn0MjhQOKcmRDILic1X1x638ZDu9RPv7VKvvAVYObb6i1eaqS5JGZL5h8Zrhi8rtovMhv6PRfkPqeuChqvrdoaGtwIE7mtYDtwzVL253RZ0BPNdOV90OnJ3
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.countplot(data=df,x='class')"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>class</th>\n",
" <th>cap-shape</th>\n",
" <th>cap-surface</th>\n",
" <th>cap-color</th>\n",
" <th>bruises</th>\n",
" <th>odor</th>\n",
" <th>gill-attachment</th>\n",
" <th>gill-spacing</th>\n",
" <th>gill-size</th>\n",
" <th>gill-color</th>\n",
" <th>...</th>\n",
" <th>stalk-surface-below-ring</th>\n",
" <th>stalk-color-above-ring</th>\n",
" <th>stalk-color-below-ring</th>\n",
" <th>veil-type</th>\n",
" <th>veil-color</th>\n",
" <th>ring-number</th>\n",
" <th>ring-type</th>\n",
" <th>spore-print-color</th>\n",
" <th>population</th>\n",
" <th>habitat</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>8124</td>\n",
" <td>8124</td>\n",
" <td>8124</td>\n",
" <td>8124</td>\n",
" <td>8124</td>\n",
" <td>8124</td>\n",
" <td>8124</td>\n",
" <td>8124</td>\n",
" <td>8124</td>\n",
" <td>8124</td>\n",
" <td>...</td>\n",
" <td>8124</td>\n",
" <td>8124</td>\n",
" <td>8124</td>\n",
" <td>8124</td>\n",
" <td>8124</td>\n",
" <td>8124</td>\n",
" <td>8124</td>\n",
" <td>8124</td>\n",
" <td>8124</td>\n",
" <td>8124</td>\n",
" </tr>\n",
" <tr>\n",
" <th>unique</th>\n",
" <td>2</td>\n",
" <td>6</td>\n",
" <td>4</td>\n",
" <td>10</td>\n",
" <td>2</td>\n",
" <td>9</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>12</td>\n",
" <td>...</td>\n",
" <td>4</td>\n",
" <td>9</td>\n",
" <td>9</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>3</td>\n",
" <td>5</td>\n",
" <td>9</td>\n",
" <td>6</td>\n",
" <td>7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>top</th>\n",
" <td>e</td>\n",
" <td>x</td>\n",
" <td>y</td>\n",
" <td>n</td>\n",
" <td>f</td>\n",
" <td>n</td>\n",
" <td>f</td>\n",
" <td>c</td>\n",
" <td>b</td>\n",
" <td>b</td>\n",
" <td>...</td>\n",
" <td>s</td>\n",
" <td>w</td>\n",
" <td>w</td>\n",
" <td>p</td>\n",
" <td>w</td>\n",
" <td>o</td>\n",
" <td>p</td>\n",
" <td>w</td>\n",
" <td>v</td>\n",
" <td>d</td>\n",
" </tr>\n",
" <tr>\n",
" <th>freq</th>\n",
" <td>4208</td>\n",
" <td>3656</td>\n",
" <td>3244</td>\n",
" <td>2284</td>\n",
" <td>4748</td>\n",
" <td>3528</td>\n",
" <td>7914</td>\n",
" <td>6812</td>\n",
" <td>5612</td>\n",
" <td>1728</td>\n",
" <td>...</td>\n",
" <td>4936</td>\n",
" <td>4464</td>\n",
" <td>4384</td>\n",
" <td>8124</td>\n",
" <td>7924</td>\n",
" <td>7488</td>\n",
" <td>3968</td>\n",
" <td>2388</td>\n",
" <td>4040</td>\n",
" <td>3148</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>4 rows × 23 columns</p>\n",
"</div>"
],
"text/plain": [
" class cap-shape cap-surface cap-color bruises odor gill-attachment \\\n",
"count 8124 8124 8124 8124 8124 8124 8124 \n",
"unique 2 6 4 10 2 9 2 \n",
"top e x y n f n f \n",
"freq 4208 3656 3244 2284 4748 3528 7914 \n",
"\n",
" gill-spacing gill-size gill-color ... stalk-surface-below-ring \\\n",
"count 8124 8124 8124 ... 8124 \n",
"unique 2 2 12 ... 4 \n",
"top c b b ... s \n",
"freq 6812 5612 1728 ... 4936 \n",
"\n",
" stalk-color-above-ring stalk-color-below-ring veil-type veil-color \\\n",
"count 8124 8124 8124 8124 \n",
"unique 9 9 1 4 \n",
"top w w p w \n",
"freq 4464 4384 8124 7924 \n",
"\n",
" ring-number ring-type spore-print-color population habitat \n",
"count 8124 8124 8124 8124 8124 \n",
"unique 3 5 9 6 7 \n",
"top o p w v d \n",
"freq 7488 3968 2388 4040 3148 \n",
"\n",
"[4 rows x 23 columns]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.describe()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>count</th>\n",
" <th>unique</th>\n",
" <th>top</th>\n",
" <th>freq</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>class</th>\n",
" <td>8124</td>\n",
" <td>2</td>\n",
" <td>e</td>\n",
" <td>4208</td>\n",
" </tr>\n",
" <tr>\n",
" <th>cap-shape</th>\n",
" <td>8124</td>\n",
" <td>6</td>\n",
" <td>x</td>\n",
" <td>3656</td>\n",
" </tr>\n",
" <tr>\n",
" <th>cap-surface</th>\n",
" <td>8124</td>\n",
" <td>4</td>\n",
" <td>y</td>\n",
" <td>3244</td>\n",
" </tr>\n",
" <tr>\n",
" <th>cap-color</th>\n",
" <td>8124</td>\n",
" <td>10</td>\n",
" <td>n</td>\n",
" <td>2284</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bruises</th>\n",
" <td>8124</td>\n",
" <td>2</td>\n",
" <td>f</td>\n",
" <td>4748</td>\n",
" </tr>\n",
" <tr>\n",
" <th>odor</th>\n",
" <td>8124</td>\n",
" <td>9</td>\n",
" <td>n</td>\n",
" <td>3528</td>\n",
" </tr>\n",
" <tr>\n",
" <th>gill-attachment</th>\n",
" <td>8124</td>\n",
" <td>2</td>\n",
" <td>f</td>\n",
" <td>7914</td>\n",
" </tr>\n",
" <tr>\n",
" <th>gill-spacing</th>\n",
" <td>8124</td>\n",
" <td>2</td>\n",
" <td>c</td>\n",
" <td>6812</td>\n",
" </tr>\n",
" <tr>\n",
" <th>gill-size</th>\n",
" <td>8124</td>\n",
" <td>2</td>\n",
" <td>b</td>\n",
" <td>5612</td>\n",
" </tr>\n",
" <tr>\n",
" <th>gill-color</th>\n",
" <td>8124</td>\n",
" <td>12</td>\n",
" <td>b</td>\n",
" <td>1728</td>\n",
" </tr>\n",
" <tr>\n",
" <th>stalk-shape</th>\n",
" <td>8124</td>\n",
" <td>2</td>\n",
" <td>t</td>\n",
" <td>4608</td>\n",
" </tr>\n",
" <tr>\n",
" <th>stalk-root</th>\n",
" <td>8124</td>\n",
" <td>5</td>\n",
" <td>b</td>\n",
" <td>3776</td>\n",
" </tr>\n",
" <tr>\n",
" <th>stalk-surface-above-ring</th>\n",
" <td>8124</td>\n",
" <td>4</td>\n",
" <td>s</td>\n",
" <td>5176</td>\n",
" </tr>\n",
" <tr>\n",
" <th>stalk-surface-below-ring</th>\n",
" <td>8124</td>\n",
" <td>4</td>\n",
" <td>s</td>\n",
" <td>4936</td>\n",
" </tr>\n",
" <tr>\n",
" <th>stalk-color-above-ring</th>\n",
" <td>8124</td>\n",
" <td>9</td>\n",
" <td>w</td>\n",
" <td>4464</td>\n",
" </tr>\n",
" <tr>\n",
" <th>stalk-color-below-ring</th>\n",
" <td>8124</td>\n",
" <td>9</td>\n",
" <td>w</td>\n",
" <td>4384</td>\n",
" </tr>\n",
" <tr>\n",
" <th>veil-type</th>\n",
" <td>8124</td>\n",
" <td>1</td>\n",
" <td>p</td>\n",
" <td>8124</td>\n",
" </tr>\n",
" <tr>\n",
" <th>veil-color</th>\n",
" <td>8124</td>\n",
" <td>4</td>\n",
" <td>w</td>\n",
" <td>7924</td>\n",
" </tr>\n",
" <tr>\n",
" <th>ring-number</th>\n",
" <td>8124</td>\n",
" <td>3</td>\n",
" <td>o</td>\n",
" <td>7488</td>\n",
" </tr>\n",
" <tr>\n",
" <th>ring-type</th>\n",
" <td>8124</td>\n",
" <td>5</td>\n",
" <td>p</td>\n",
" <td>3968</td>\n",
" </tr>\n",
" <tr>\n",
" <th>spore-print-color</th>\n",
" <td>8124</td>\n",
" <td>9</td>\n",
" <td>w</td>\n",
" <td>2388</td>\n",
" </tr>\n",
" <tr>\n",
" <th>population</th>\n",
" <td>8124</td>\n",
" <td>6</td>\n",
" <td>v</td>\n",
" <td>4040</td>\n",
" </tr>\n",
" <tr>\n",
" <th>habitat</th>\n",
" <td>8124</td>\n",
" <td>7</td>\n",
" <td>d</td>\n",
" <td>3148</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" count unique top freq\n",
"class 8124 2 e 4208\n",
"cap-shape 8124 6 x 3656\n",
"cap-surface 8124 4 y 3244\n",
"cap-color 8124 10 n 2284\n",
"bruises 8124 2 f 4748\n",
"odor 8124 9 n 3528\n",
"gill-attachment 8124 2 f 7914\n",
"gill-spacing 8124 2 c 6812\n",
"gill-size 8124 2 b 5612\n",
"gill-color 8124 12 b 1728\n",
"stalk-shape 8124 2 t 4608\n",
"stalk-root 8124 5 b 3776\n",
"stalk-surface-above-ring 8124 4 s 5176\n",
"stalk-surface-below-ring 8124 4 s 4936\n",
"stalk-color-above-ring 8124 9 w 4464\n",
"stalk-color-below-ring 8124 9 w 4384\n",
"veil-type 8124 1 p 8124\n",
"veil-color 8124 4 w 7924\n",
"ring-number 8124 3 o 7488\n",
"ring-type 8124 5 p 3968\n",
"spore-print-color 8124 9 w 2388\n",
"population 8124 6 v 4040\n",
"habitat 8124 7 d 3148"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.describe().transpose()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAACP8AAAU+CAYAAAAMRwM5AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAB7CAAAewgFu0HU+AAEAAElEQVR4nOzdabRld1nn8d9TFAkShsiUggRkbhQQWCSBMAhKi0DQyKRiKwQTp26i0DLKICC0MigitAODJLQNCgIdDEFQRNAITYF0i0IIkkCChECgGWQKSf794p5qDtVVdevee4Z6zv181rpr7332f+/9VF5mfdfeNcYIAAAAAAAAAADQz45lDwAAAAAAAAAAAGyO+AcAAAAAAAAAAJoS/wAAAAAAAAAAQFPiHwAAAAAAAAAAaEr8AwAAAAAAAAAATYl/AAAAAAAAAACgKfEPAAAAAAAAAAA0Jf4BAAAAAAAAAICmxD8AAAAAAAAAANCU+AcAAAAAAAAAAJoS/wAAAAAAAAAAQFPiHwAAAAAAAAAAaEr8AwAAAAAAAAAATYl/AAAAAAAAAACgKfEPAAAAAAAAAAA0Jf4BAAAAAAAAAICmdi57ALamqg5PcvvJ4WeTXLHEcQAAAAAAAAAA2L+rJLn+ZP+DY4xvbPWG4p/+bp9k97KHAAAAAAAAAABgQ45L8r6t3sRnvwAAAAAAAAAAoClv/unvs3t23vve9+aGN7zhMmcBAAAAAAAAAGA/Lr744hx//PF7Dj97oLUHS/zT3xV7dm54wxvmmGOOWeYsAAAAAAAAAAAcnCvWX7I+n/0CAAAAAAAAAICmxD8AAAAAAAAAANCU+AcAAAAAAAAAAJoS/wAAAAAAAAAAQFPiHwAAAAAAAAAAaEr8AwAAAAAAAAAATYl/AAAAAAAAAACgKfEPAAAAAAAAAAA0Jf4BAAAAAAAAAICmxD8AAAAAAAAAANCU+AcAAAAAAAAAAJoS/wAAAAAAAAAAQFPiHwAAAAAAAAAAaEr8AwAAAAAAAAAATYl/AAAAAAAAAACgKfEPAAAAAAAAAAA0Jf4BAAAAAAAAAICmxD8AAAAAAAAAANCU+AcAAAAAAAAAAJoS/wAAAAAAAAAAQFPiHwAAAAAAAAAAaEr8AwAAAAAAAAAATYl/AAAAAAAAAACgKfEPAAAAAAAAAAA0Jf4BAAAAAAAAAICmxD8AAAAAAAAAANCU+AcAAAAAAAAAAJoS/wAAAAAAAAAAQFPiHwAAAAAAAAAAaEr8AwAAAAAAAAAATYl/AAAAAAAAAACgqZ3LHmBZquoGSY6f/B03+bvu5PQZY4yTD+IeV09yvyQ/mOTYJLdMco0kX0pyXpK3JvmDMcanZz0/AAAAAAAAAABs2/gnySVbubiqvjfJOVmLffZ2nSR3nfw9tqp+bozxp1t5HgAAAAAAAAAA7G07xz/TLkxybpL7buCaa+Vb4c85Sc5K8r4kn0ty/SQPTvKzk3X/vaq+NMZ4y8wmBgAAAAAAAABg29vO8c+zkuxOsnuMcUlV3TTJBRu4/sokr03yzDHGh/Zx/m1V9ZYkb0xylSQvrqpbjTHGFucGAAAAAAAAAIAk2zj+GWP82hav//skf7/OmjOr6g1JHpLkFknulOQftvJcAAAAAAAAAADYY8eyB9gG3jG1f4ulTQEAAAAAAAAAwMoR/8zf4VP7VyxtCgAAAAAAAAAAVs62/ezXAt1rav/DG724qo5ZZ8mujd4TAAAAAAAAAIDVIP6Zo6q6Q5ITJ4cfHGNsOP5JctEMRwIAAAAAAAAAYIWIf+akqg5P8vIkV5n89JQljgMAAAAAAAAAsCmfeclblj1CGzd49P0X/kzxz/y8JMmxk/0zxhh/vsn73Hid87uS7N7kvQEAAAAAAAAAaEz8MwdV9eQkp04Odyf5T5u91xjjk+s8a7O3BgAAAAAAAACguR3LHmDVVNXPJ/kvk8NzkzxgjPGVJY4EAAAAAAAAAMCKEv/MUFU9PMnvTQ4/keQHxxiXLnEkAAAAAAAAAABWmPhnRqrqR5K8Kmv/TS9Ocp/1PtkFAAAAAAAAAABbIf6Zgaq6T5LXJtmZ5HNZe+PPx5Y7FQAAAAAAAAAAq078s0VVdbckZyY5PMkXk/zQGOOflzsVAAAAAAAAAADbgfhnC6rqjknenOSIJF9JcuIY4/1LHQoAAAAAAAAAgG1j57IHWJaqukeSW079dL2p/VtW1cnT68cYp+91/S2SvDXJkZOfnprki1V1uwM89jNjjM9scmQAAAAAAAAAAPg22zb+SXJqkkfu59zdJ3/TTt/r+J5JbjB1/MKDeOYzkzzjINYBAAAAAAAAAMC6fPYLAAAAAAAAAACa2rZv/hljnJzk5C1cf3r+/7cBAQAAAAAAAADAwnjzDwAAAAAAAAAANCX+AQAAAAAAAACApsQ/AAAAAAAAAADQlPgHAAAAAAAAAACaEv8AAAAAAAAAAEBT4h8AAAAAAAAAAGhK/AMAAAAAAAAAAE2JfwAAAAAAAAAAoCnxDwAAAAAAAAAANCX+AQAAAAAAAACApsQ/AAAAAAAAAADQlPgHAAAAAAAAAACaEv8AAAAAAAAAAEBT4h8AAAAAAAAAAGhK/AMAAAAAAAAAAE2JfwAAAAAAAAAAoCnxDwAAAAAAAAAANCX+AQAAAAAAAACApsQ/AAAAAAAAAADQlPgHAAAAAAAAAACaEv8AAAAAAAAAAEBT4h8AAAAAAAAAAGhK/AMAAAAAAAAAAE2JfwAAAAAAAAAAoCnxDwAAAAAAAAAANCX+AQAAAAAAAACApsQ/AAAAAAAAAADQlPgHAAAAAAAAAACaEv8AAAAAAAAAAEBT4h8AAAAAAAAAAGhK/AMAAAAAAAAAAE2JfwAAAAAAAAAAoCnxDwAAAAAAAAAANCX+AQAAAAAAAACApsQ/AAAAAAAAAADQlPgHAAAAAAAAAACaEv8AAAAAAAAAAEBT4h8AAAAAAAAAAGhK/AMAAAAAAAAAAE2JfwAAAAAAAAAAoCnxDwAAAAAAAAAANCX+AQAAAAAAAACApsQ/AAAAAAAAAADQlPgHAAAAAAAAAACaEv8AAAAAAAAAAEBT4h8AAAAAAAAAAGhK/AMAAAAAAAAAAE2JfwAAAAAAAAAAoCnxDwAAAAAAAAAANCX+AQAAAAAAAACApsQ/AAAAAAAAAADQlPgHAAAAAAAAAACaEv8AAAAAAAAAAEBT4h8AAAAAAAAAAGhK/AMAAAAAAAAAAE2JfwAAAAAAAAAAoCnxDwAAAAAAAAAANCX+AQAAAAAAAACApsQ/AAAAAAAAAADQlPgHAAAAAAAAAACaEv8AAAAAAAAAAEBT4h8AAAAAAAAAAGhK/AMAAAAAAAAAAE2JfwAAAAAAAAAAoCnxDwAAAAAAAAAANCX+AQAAAAAAAACApsQ/AAAAAAAAAADQlPgHAAAAAAAAAACaEv8AAAAAAAAAAEBT4h8AAAAAAAAAAGhK/AMAAAAAAAAAAE2JfwAAAAAAAAAAoCnxDwAAAAAAAAAANCX+AQAAAAAAAACApsQ/AAAAAAAAAADQlPgHAAAAAAAAAACaEv8AAAAAAAAAAEBT4h8AAAAAAAAAAGhK/AMAAAAAAAAAAE2JfwAAAAAAAAAAoCnxDwAAAAAAAAAANCX+AQAAAAAAAACApsQ/AAAAAAAAAADQlPgHAAAAAAAAAACaEv8AAAAAAAAAAEBT4h8AAAAAAAAAAGhK/AMAAAAAAAAAAE2JfwAAAAAAAAAAoCnxDwAAAAAAAAAANCX+AQAAAAAAAACApsQ/AAAAAAAAAADQlPgHAAAAAAAAAACaEv8AAAAAAAAAAEBT4h8AAAAAAAAAAGhK/AMAAAAAAAAAAE2JfwAAAAAAAAAAoCnxDwAAAAAAAAAANCX+AQAAAAAAAACApsQ/AAAAAAAAAADQlPgHAAAAAAAAAACaEv8AAAAAAAAAAEBT4h8AAAAAAAAAAGhK/AMAAAAAAAAAAE2JfwAAAAAAAAAAoCnxDwAAAAAAAAAANCX+AQAAAAAAAACApsQ/AAAAAAAAAADQlPgHAAAAAAAAAACaEv8AAAAAAAAAAEB
"text/plain": [
"<Figure size 2800x1200 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(14,6),dpi=200)\n",
"sns.barplot(data=df.describe().transpose().reset_index().sort_values('unique'),x='index',y='unique')\n",
"plt.xticks(rotation=90);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Train Test Split"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"X = df.drop('class',axis=1)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"X = pd.get_dummies(X,drop_first=True)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"y = df['class']"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.model_selection import train_test_split"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=101)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Modeling"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.ensemble import AdaBoostClassifier"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"model = AdaBoostClassifier(n_estimators=1)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AdaBoostClassifier(n_estimators=1)"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model.fit(X_train,y_train)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Evaluation"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.metrics import classification_report,plot_confusion_matrix,accuracy_score"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"predictions = model.predict(X_test)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['p', 'e', 'p', ..., 'p', 'p', 'e'], dtype=object)"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"predictions"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" e 0.96 0.81 0.88 655\n",
" p 0.81 0.96 0.88 564\n",
"\n",
" accuracy 0.88 1219\n",
" macro avg 0.88 0.88 0.88 1219\n",
"weighted avg 0.89 0.88 0.88 1219\n",
"\n"
]
}
],
"source": [
"print(classification_report(y_test,predictions))"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
" 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
" 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
" 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
" 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
" 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model.feature_importances_"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"22"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model.feature_importances_.argmax()"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'odor_n'"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X.columns[22]"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:xlabel='odor', ylabel='count'>"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYsAAAEGCAYAAACUzrmNAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAZFklEQVR4nO3df5TVdb3v8edLROGkJchkyGBDHVLxB6gj6FWrK4nmXedgpmmlonUXHtOj3nN0XbO1xItQx2udrh1JFxxJUDr+KmtOxw6HwGtHbyYzxq+RjPEHMiwVGsw00gTf94/9GdvCzHw3ur/7u4d5Pdbaa3+/7++v94ZhXnx/bkUEZmZmfdmj6AbMzKz+OSzMzCyTw8LMzDI5LMzMLJPDwszMMu1ZdAN5GDFiRDQ1NRXdhplZv9LW1vbbiGjoadpuGRZNTU20trYW3YaZWb8iaX1v03I7DCVpiKTHJa2U1C7pf6X6HZKelbQivSakuiR9R1KHpFWSji5b1zRJ69JrWl49m5lZz/Lcs3gDODkiXpM0GHhE0k/TtKsj4v4d5v80MDa9JgG3ApMkDQdmAM1AAG2SWiLi5Rx7NzOzMrntWUTJa2l0cHr1dbv4VGBhWu4xYD9JI4FTgSURsSUFxBLgtLz6NjOzneV6zkLSIKAN+EtgTkT8UtIlwGxJ1wFLgWsi4g1gFLChbPHOVOutvuO2pgPTAQ466KAcPo0NRG+++SadnZ28/vrrRbfSqyFDhtDY2MjgwYOLbsV2Y7mGRURsByZI2g94QNLhwFeBF4G9gLnA/wRmVmFbc9P6aG5u9gOvrCo6OzvZd999aWpqQlLR7ewkIujq6qKzs5MxY8YU3Y7txmpyn0VE/A54CDgtIl5Ih5reAL4HTEyzbQRGly3WmGq91c1y9/rrr7P//vvXZVAASGL//fev6z0f2z3keTVUQ9qjQNJQ4BTg1+k8BCr96zsDWJMWaQEuSFdFHQe8EhEvAIuBKZKGSRoGTEk1s5qo16DoVu/92e4hz8NQI4EF6bzFHsC9EfETScskNQACVgB/k+Z/EDgd6AC2AhcBRMQWSTcAy9N8MyNiS459m5nZDnILi4hYBRzVQ/3kXuYP4NJeps0H5le1QbMcXX/99eyzzz5cddVVRbdiVhW75R3cZt2en3nEu172oOtWV7ETs/7NDxI0q4KFCxdy5JFHMn78eM4///x3TJs3bx7HHnss48eP57Of/Sxbt24F4L777uPwww9n/PjxfPzjHwegvb2diRMnMmHCBI488kjWrVtX889i1hOHhdl71N7ezqxZs1i2bBkrV67k5ptvfsf0M888k+XLl7Ny5UoOPfRQbr/9dgBmzpzJ4sWLWblyJS0tLQDcdtttXHHFFaxYsYLW1lYaGxtr/nnMeuKwMHuPli1bxtlnn82IESMAGD58+Dumr1mzhpNOOokjjjiCRYsW0d7eDsAJJ5zAhRdeyLx589i+fTsAxx9/PF//+te58cYbWb9+PUOHDq3thzHrhcPCLGcXXnght9xyC6tXr2bGjBlv3xNx2223MWvWLDZs2MAxxxxDV1cXX/jCF2hpaWHo0KGcfvrpLFu2rODuzUocFmbv0cknn8x9991HV1cXAFu2vPPK7ldffZWRI0fy5ptvsmjRorfrTz/9NJMmTWLmzJk0NDSwYcMGnnnmGT7ykY9w+eWXM3XqVFatWlXTz2LWG18NZfYeHXbYYXzta1/jE5/4BIMGDeKoo46i/Mu3brjhBiZNmkRDQwOTJk3i1VdfBeDqq69m3bp1RASTJ09m/Pjx3Hjjjdx5550MHjyYD33oQ1x77bUFfSqzd1Lp9obdS3Nzc/jLjwze+6Wza9eu5dBDD61iR/noL31afZPUFhHNPU3zYSgzM8vksDAzs0wOCzMzy+SwMDOzTA4LMzPL5LAwM7NMvs/CbBccc/XCqq6v7aYLqro+s7x4z8LMzDI5LMzq3HPPPcchhxzCF7/4RQ499FDOOuustx9zblYrDguzfuCpp57iK1/5CmvXruX9738/3/3ud4tuyQYYh4VZPzB69GhOOOEEAM477zweeeSRgjuygcZhYdYPSOpz3CxvDguzfuD555/nF7/4BQDf//73OfHEEwvuyAaa3C6dlTQE+Dmwd9rO/RExQ9IY4G5gf6ANOD8i/iRpb2AhcAzQBZwTEc+ldX0V+DKwHbg8Ihbn1bdZX4q61PXggw9mzpw5fOlLX2LcuHFccsklhfRhA1ee91m8AZwcEa9JGgw8IumnwN8B346IuyXdRikEbk3vL0fEX0o6F7gROEfSOOBc4DDgQOBnkj4WEdtz7N2sruy5557cddddRbdhA1huh6Gi5LU0Oji9AjgZuD/VFwBnpOGpaZw0fbJKB2anAndHxBsR8SzQAUzMq28zM9tZrucsJA2StALYBCwBngZ+FxHb0iydwKg0PArYAJCmv0LpUNXb9R6WMdvtNTU1sWbNmqLbsAEu17CIiO0RMQFopLQ3cEhe25I0XVKrpNbNmzfntRkzswGpJldDRcTvgIeA44H9JHWfK2kENqbhjcBogDT9A5ROdL9d72GZ8m3MjYjmiGhuaGjI42OYmQ1YuYWFpAZJ+6XhocApwFpKoXFWmm0a8OM03JLGSdOXRekLwluAcyXtna6kGgs8nlffZma2szyvhhoJLJA0iFIo3RsRP5H0JHC3pFnAr4Db0/y3A3dK6gC2ULoCiohol3Qv8CSwDbjUV0KZmdVWbmEREauAo3qoP0MPVzNFxOvA2b2sazYwu9o9mu2q52ceUdX1HXTd6qquzywvvoPbzMwyOSzM6txdd93FxIkTmTBhAhdffDHbt/sorNWew8Ksjq1du5Z77rmHRx99lBUrVjBo0CAWLVpUdFs2APlrVc3q2NKlS2lra+PYY48F4I9//CMf/OAHC+7KBiKHhVkdiwimTZvGN77xjaJbsQHOh6HM6tjkyZO5//772bRpEwBbtmxh/fr1BXdlA5H3LMx2Qa0vdR03bhyzZs1iypQpvPXWWwwePJg5c+bw4Q9/uKZ9mDkszOrcOeecwznnnFN0GzbA+TCUmZllcliYmVkmh4VZhtLzLOtXvfdnuweHhVkfhgwZQldXV93+Qo4Iurq6GDJkSNGt2G7OJ7jN+tDY2EhnZyf1/IVaQ4YMobGxseg2bDfnsDDrw+DBgxkzZkzRbZgVzoehzMwsk8PCzMwyOSzMzCyTw8LMzDI5LMzMLJPDwszMMjkszMwsk8PCzMwy5RYWkkZLekjSk5LaJV2R6tdL2ihpRXqdXrbMVyV1SHpK0qll9dNSrUPSNXn1bGZmPcvzDu5twN9HxBOS9gXaJC1J074dEd8sn1nSOOBc4DDgQOBnkj6WJs8BTgE6geWSWiLiyRx7NzOzMrmFRUS8ALyQhl+VtBYY1cciU4G7I+IN4FlJHcDENK0jIp4BkHR3mtdhYWZWIzU5ZyGpCTgK+GUqXSZplaT5koal2ihgQ9linanWW33HbUyX1CqptZ4f+mZm1h/lHhaS9gF+AFwZEb8HbgU+CkygtOfxrWpsJyLmRkRzRDQ3NDRUY5VmZpbk+tRZSYMpBcWiiPghQES8VDZ9HvCTNLoRGF22eGOq0UfdzMxqIM+roQTcDqyNiH8sq48sm+0zwJo03AKcK2lvSWOAscDjwHJgrKQxkvaidBK8Ja++zcxsZ3nuWZwAnA+slrQi1a4FPi9pAhDAc8DFABHRLuleSieutwGXRsR2AEmXAYuBQcD8iGjPsW8zM9tBnldDPQKoh0kP9rHMbGB2D/UH+1rOzMzy5Tu4zcwsk8PCzMwyOSzMzCyTw8LMzDI5LMzMLJPDwszMMjkszMwsk8PCzMwyOSzMzCyTw8LMzDI5LMzMLJPDwszMMjkszMwsk8PCzMwyOSzMzCxTrl+rara7Oubqhe962babLqhiJ2a14T0LMzPL5LAwM7NMDgszM8vksDAzs0wOCzMzy5RbWEgaLekhSU9Kapd0RaoPl7RE0rr0PizVJek7kjokrZJ0dNm6pqX510mallfPZmbWszz3LLYBfx8R44DjgEsljQOuAZZGxFhgaRoH+DQ
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.countplot(data=df,x='odor',hue='class')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Analyzing performance as more weak learners are added."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"95"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(X.columns)"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [],
"source": [
"error_rates = []\n",
"\n",
"for n in range(1,96):\n",
" \n",
" model = AdaBoostClassifier(n_estimators=n)\n",
" model.fit(X_train,y_train)\n",
" preds = model.predict(X_test)\n",
" err = 1 - accuracy_score(y_test,preds)\n",
" \n",
" error_rates.append(err)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[<matplotlib.lines.Line2D at 0x289c33b1f70>]"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAD4CAYAAADiry33AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAaOElEQVR4nO3df3Rc5X3n8fdHI42ksbElG0HAtrADToohbQBj0iahKSSpabI43QUCTU9IDjl0z8K2u0lPl2y3JKXd7LInJ6Sn8XZLQ7I0JDEc0iY+jbc+4Ud/LEkcC+iGGIdEGOMfISD//iHLsuTv/jFXYhiP0UiWNNZ9Pq9zfJh77zPSM8P4M4+/997nUURgZmb51dToDpiZ2dRy0JuZ5ZyD3sws5xz0ZmY556A3M8u55kZ3oNqZZ54ZixcvbnQ3zMxmlCeffHJXRHTVOnbaBf3ixYvp6elpdDfMzGYUSS+e7JhLN2ZmOeegNzPLOQe9mVnOOejNzHLOQW9mlnMOejOznHPQm5nl3Gl3Hf1E9Q8O8b/+4fnX7Pv1i9/ARefObVCPzMxOD7kJ+iODw/z5472j2xHQ23eI//mhyxrYKzOzxqurdCNppaTnJPVKuqPG8SslPSVpSNJ1FfvfKul7kjZJ+qGkD05m5yvNn93KC//tfaN/ViyZx+5Dg1P168zMZowxg15SAVgNXAMsA26StKyq2TbgI8DXqvb3Ax+OiIuAlcDnJXWcYp/r0llqYV//sen4VWZmp7V6SjcrgN6I2AIgaQ2wCnh2pEFEbM2OHa98YkT8pOLxzyS9AnQB+06142PpaC+yt3/Kf42Z2WmvntLNAmB7xfaObN+4SFoBFIHnaxy7VVKPpJ6+vr7x/uiaOmaVR/ReE9fMUjctl1dKOgf4CvDRiDhefTwi7o2I5RGxvKur5iyb49ZZKjI4fJz+weFJ+XlmZjNVPUG/E1hUsb0w21cXSXOAbwN/GBHfH1/3Jq6z1ALA3n6fkDWztNUT9BuBpZKWSCoCNwJr6/nhWfu/Bf46Ih6eeDfHr6NUBPAJWTNL3phBHxFDwO3AemAz8FBEbJJ0l6RrASRdLmkHcD3wl5I2ZU+/AbgS+Iikf8n+vHUqXki1zizoPaI3s9TVdcNURKwD1lXtu7Pi8UbKJZ3q5z0APHCKfZyQkdKNR/RmlrrcznXzaunGI3ozS1uOg37kZKxH9GaWttwGfUuhiTNam12jN7Pk5Tbo4dWbpszMUpbroO8sFT2iN7Pk5TroO0pF1+jNLHn5Dvr2Fl91Y2bJy3XQd5Za2HvYQW9mact10HeUihwYGGJo+IR51MzMkpHroB+5O3b/EdfpzSxd+Q76WSPz3TjozSxduQ56T4NgZpbzoO/0NAhmZnkPek9VbGaW66DvGJ2q2EFvZunKddDPbm2muUku3ZhZ0nId9JLoKBU9ojezpOU66GHk7liP6M0sXQkEvWewNLO05T7o55Y8J72ZpS33Qd9ZamHfEY/ozSxdCQR9eU76iGh0V8zMGiL3Qd9RKjI4dJwjx4Yb3RUzs4aoK+glrZT0nKReSXfUOH6lpKckDUm6rurYzZJ+mv25ebI6Xi9Pg2BmqRsz6CUVgNXANcAy4CZJy6qabQM+Anyt6rnzgE8BVwArgE9J6jz1btdvZGIzL0BiZqmqZ0S/AuiNiC0RMQisAVZVNoiIrRHxQ6B6hY9fB74TEXsiYi/wHWDlJPS7bp2j0yB4RG9maaon6BcA2yu2d2T76lHXcyXdKqlHUk9fX1+dP7o+r85J7xG9maXptDgZGxH3RsTyiFje1dU1qT/bE5uZWerqCfqdwKKK7YXZvnqcynMnRUe7V5kys7TVE/QbgaWSlkgqAjcCa+v8+euB90rqzE7CvjfbN22KzU3Mbm126cbMkjVm0EfEEHA75YDeDDwUEZsk3SXpWgBJl0vaAVwP/KWkTdlz9wB/QvnLYiNwV7ZvWnV4GgQzS1hzPY0iYh2wrmrfnRWPN1Iuy9R67peAL51CH09ZR6nFI3ozS9ZpcTJ2qo1Mg2BmlqIkgt6Lj5hZypII+jltzRwaGGp0N8zMGiKJoG9vKdA/6EnNzCxNSQR9qVjgyLFhT1VsZklKIujbigUAjg5VT8VjZpZ/SQR9e0s56F2+MbMUJRH0pWxE78VHzCxFSQR9WzaiPzLoK2/MLD1JBH2pWL4B+Miga/Rmlp4kgn6kRu/SjZmlKI2gL46cjHXpxszSk0bQZyP6AY/ozSxBaQR90ZdXmlm6kgh6X15pZilLIuhfvbzSQW9m6Uki6EdH9A56M0tQEkHfUmiiuUku3ZhZkpIIevBUxWaWrnSCvljw5ZVmlqSkgt6lGzNLUTpB79KNmSUqnaB36cbMElVX0EtaKek5Sb2S7qhxvFXSg9nxDZIWZ/tbJN0v6RlJmyV9cpL7X7dS0SN6M0vTmEEvqQCsBq4BlgE3SVpW1ewWYG9EXADcA9yd7b8eaI2ItwCXAb8z8iUw3dpbCr6O3sySVM+IfgXQGxFbImIQWAOsqmqzCrg/e/wwcLUkAQHMktQMtAODwIFJ6fk4tbW4dGNmaaon6BcA2yu2d2T7araJiCFgPzCfcugfBl4CtgGfjYg9p9jnCXHpxsxSNdUnY1cAw8C5wBLgE5LeWN1I0q2SeiT19PX1TUlH2lt8eaWZpameoN8JLKrYXpjtq9kmK9PMBXYDvwX8fUQci4hXgCeA5dW/ICLujYjlEbG8q6tr/K+iDu3FZtfozSxJ9QT9RmCppCWSisCNwNqqNmuBm7PH1wGPRURQLtdcBSBpFvA24MeT0fHxam8pMDh8nKFhrxtrZmkZM+izmvvtwHpgM/BQRGySdJeka7Nm9wHzJfUCHwdGLsFcDcyWtInyF8aXI+KHk/0i6tFeLL/UgSEHvZmlpbmeRhGxDlhXte/OiscDlC+lrH7eoVr7G6G9WH6p/YNDzG6t62WbmeVCOnfGjqwbO+gRvZmlJZmgH1l8pP/YUIN7YmY2vZIJ+nYvJ2hmiUon6L1AuJklKp2g94jezBKVTtB7RG9miUon6LMRvee7MbPUpBP02YjeM1iaWWqSCfqRyytdozez1CQT9G3NLt2YWZqSCfqmJtHa3OTSjZklJ5mgBy8+YmZpSirovfiImaUoraAveoFwM0tPekHvEb2ZJSapoC+1eDlBM0tPUkHfVizQ7xG9mSUmqaBvb2liwCN6M0tMUkFfKjZ74REzS05SQd/WUuCIlxI0s8QkFfSlYsF3xppZcpIK+vaWAv2DQ0REo7tiZjZt0gr6YoHjAYPDLt+YWTrSCnovJ2hmCaor6CWtlPScpF5Jd9Q43irpwez4BkmLK479oqTvSdok6RlJbZPY/3HxcoJmlqIxg15SAVgNXAMsA26StKyq2S3A3oi4ALgHuDt7bjPwAPBvI+Ii4F3AsUnr/Th58REzS1E9I/oVQG9EbImIQWANsKqqzSrg/uzxw8DVkgS8F/hhRPw/gIjYHRENS9k2rxtrZgmqJ+gXANsrtndk+2q2iYghYD8wH3gTEJLWS3pK0h/U+gWSbpXUI6mnr69vvK+hbiWvG2tmCZrqk7HNwDuAD2X//U1JV1c3ioh7I2J5RCzv6uqass60e0RvZgmqJ+h3Aosqthdm+2q2yeryc4HdlEf//xQRuyKiH1gHXHqqnZ6okdKNT8aaWUrqCfqNwFJJSyQVgRuBtVVt1gI3Z4+vAx6L8l1J64G3SCplXwC/Cjw7OV0fP5duzCxFzWM1iIghSbdTDu0C8KWI2CTpLqAnItYC9wFfkdQL7KH8ZUBE7JX0OcpfFgGsi4hvT9FrGdPI5ZUu3ZhZSsYMeoCIWEe57FK5786KxwPA9Sd57gOUL7FsuFJL+eX68kozS0lSd8a2Fcsv1zV6M0tJUkFfLDTRJI/ozSwtSQW9JErFZo/ozSwpSQU9lC+x9MlYM0tJckHvxUfMLDXJBf3I4iO1/MU/PM8//mTqpmAwM2uE9IK+WODIsdoLj6x+vJdvPl1906+Z2cyWXtC3FBioUaPvHxzi0NEhDhxp2CzKZmZTIr2gLxboP3Zi6WbXwUEADgw46M0sX5IM+lr
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.plot(range(1,96),error_rates)"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"AdaBoostClassifier(n_estimators=95)"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0.01052632, 0. ,\n",
" 0. , 0.01052632, 0. , 0. , 0. ,\n",
" 0.01052632, 0. , 0.05263158, 0.03157895, 0.03157895,\n",
" 0. , 0. , 0.06315789, 0.02105263, 0. ,\n",
" 0. , 0. , 0.09473684, 0.09473684, 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0.01052632, 0.01052632, 0. , 0. , 0. ,\n",
" 0.06315789, 0. , 0. , 0. , 0. ,\n",
" 0.03157895, 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0. , 0. , 0.06315789, 0. , 0. ,\n",
" 0.01052632, 0. , 0. , 0. , 0. ,\n",
" 0. , 0.01052632, 0. , 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0. ,\n",
" 0.05263158, 0. , 0.16842105, 0. , 0.10526316,\n",
" 0. , 0. , 0.04210526, 0. , 0. ,\n",
" 0. , 0. , 0. , 0. , 0.01052632])"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model.feature_importances_"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"feats = pd.DataFrame(index=X.columns,data=model.feature_importances_,columns=['Importance'])"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Importance</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>cap-shape_c</th>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>cap-shape_f</th>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>cap-shape_k</th>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>cap-shape_s</th>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>cap-shape_x</th>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>habitat_l</th>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>habitat_m</th>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>habitat_p</th>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>habitat_u</th>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>habitat_w</th>\n",
" <td>0.010526</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>95 rows × 1 columns</p>\n",
"</div>"
],
"text/plain": [
" Importance\n",
"cap-shape_c 0.000000\n",
"cap-shape_f 0.000000\n",
"cap-shape_k 0.000000\n",
"cap-shape_s 0.000000\n",
"cap-shape_x 0.000000\n",
"... ...\n",
"habitat_l 0.000000\n",
"habitat_m 0.000000\n",
"habitat_p 0.000000\n",
"habitat_u 0.000000\n",
"habitat_w 0.010526\n",
"\n",
"[95 rows x 1 columns]"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"feats"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [],
"source": [
"imp_feats = feats[feats['Importance']>0]"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Importance</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>cap-color_c</th>\n",
" <td>0.010526</td>\n",
" </tr>\n",
" <tr>\n",
" <th>cap-color_n</th>\n",
" <td>0.010526</td>\n",
" </tr>\n",
" <tr>\n",
" <th>cap-color_w</th>\n",
" <td>0.010526</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bruises_t</th>\n",
" <td>0.052632</td>\n",
" </tr>\n",
" <tr>\n",
" <th>odor_c</th>\n",
" <td>0.031579</td>\n",
" </tr>\n",
" <tr>\n",
" <th>odor_f</th>\n",
" <td>0.031579</td>\n",
" </tr>\n",
" <tr>\n",
" <th>odor_n</th>\n",
" <td>0.063158</td>\n",
" </tr>\n",
" <tr>\n",
" <th>odor_p</th>\n",
" <td>0.021053</td>\n",
" </tr>\n",
" <tr>\n",
" <th>gill-spacing_w</th>\n",
" <td>0.094737</td>\n",
" </tr>\n",
" <tr>\n",
" <th>gill-size_n</th>\n",
" <td>0.094737</td>\n",
" </tr>\n",
" <tr>\n",
" <th>stalk-shape_t</th>\n",
" <td>0.010526</td>\n",
" </tr>\n",
" <tr>\n",
" <th>stalk-root_b</th>\n",
" <td>0.010526</td>\n",
" </tr>\n",
" <tr>\n",
" <th>stalk-surface-above-ring_k</th>\n",
" <td>0.063158</td>\n",
" </tr>\n",
" <tr>\n",
" <th>stalk-surface-below-ring_y</th>\n",
" <td>0.031579</td>\n",
" </tr>\n",
" <tr>\n",
" <th>stalk-color-below-ring_n</th>\n",
" <td>0.063158</td>\n",
" </tr>\n",
" <tr>\n",
" <th>stalk-color-below-ring_w</th>\n",
" <td>0.010526</td>\n",
" </tr>\n",
" <tr>\n",
" <th>ring-number_t</th>\n",
" <td>0.010526</td>\n",
" </tr>\n",
" <tr>\n",
" <th>spore-print-color_r</th>\n",
" <td>0.052632</td>\n",
" </tr>\n",
" <tr>\n",
" <th>spore-print-color_w</th>\n",
" <td>0.168421</td>\n",
" </tr>\n",
" <tr>\n",
" <th>population_c</th>\n",
" <td>0.105263</td>\n",
" </tr>\n",
" <tr>\n",
" <th>population_v</th>\n",
" <td>0.042105</td>\n",
" </tr>\n",
" <tr>\n",
" <th>habitat_w</th>\n",
" <td>0.010526</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Importance\n",
"cap-color_c 0.010526\n",
"cap-color_n 0.010526\n",
"cap-color_w 0.010526\n",
"bruises_t 0.052632\n",
"odor_c 0.031579\n",
"odor_f 0.031579\n",
"odor_n 0.063158\n",
"odor_p 0.021053\n",
"gill-spacing_w 0.094737\n",
"gill-size_n 0.094737\n",
"stalk-shape_t 0.010526\n",
"stalk-root_b 0.010526\n",
"stalk-surface-above-ring_k 0.063158\n",
"stalk-surface-below-ring_y 0.031579\n",
"stalk-color-below-ring_n 0.063158\n",
"stalk-color-below-ring_w 0.010526\n",
"ring-number_t 0.010526\n",
"spore-print-color_r 0.052632\n",
"spore-print-color_w 0.168421\n",
"population_c 0.105263\n",
"population_v 0.042105\n",
"habitat_w 0.010526"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"imp_feats"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [],
"source": [
"imp_feats = imp_feats.sort_values(\"Importance\")"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAACRoAAAU1CAYAAAB2xeoPAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAB7CAAAewgFu0HU+AAEAAElEQVR4nOzde7i2ZVkn/u/5+iI7IUVUVAz3gaMkgSbukOznYLjNUnPSYDTaKirhJsfSMZkZjRRNy5RAoyYtUzO1wykUxTBw11CASJqiuQUxeNnL+fvjud/hcbHWutf+gXd9PsdxH9d1ree8r+uE99/vcV/V3QEAAAAAAAAAAFjMllk3AAAAAAAAAAAA3PwJGgEAAAAAAAAAAKMEjQAAAAAAAAAAgFGCRgAAAAAAAAAAwChBIwAAAAAAAAAAYJSgEQAAAAAAAAAAMErQCAAAAAAAAAAAGCVoBAAAAAAAAAAAjBI0AgAAAAAAAAAARgkaAQAAAAAAAAAAowSNAAAAAAAAAACAUYJGAAAAAAAAAADAKEEjAAAAAAAAAABglKARAAAAAAAAAAAwStAIAAAAAAAAAAAYJWgEAAAAAAAAAACMEjQCAAAAAAAAAABGbZ11A9xyVNXOSR4wLL+d5PszbAcAAAAAAAAAgIXdKskdhvm53X3NajcUNGI5HpDknFk3AQAAAAAAAADAsjwoyadWu4mr0wAAAAAAAAAAgFG+aMRyfHv75Oyzz86d73znWfYCAAAAAAAAAMACvv71r+fBD37w9uW3F6tdKkEjluP72yd3vvOds++++86yFwAAAAAAAAAAlub74yXjXJ0GAAAAAAAAAACMEjQCAAAAAAAAAABGCRoBAAAAAAAAAACjBI0AAAAAAAAAAIBRgkYAAAAAAAAAAMAoQSMAAAAAAAAAAGCUoBEAAAAAAAAAADBK0AgAAAAAAAAAABglaAQAAAAAAAAAAIwSNAIAAAAAAAAAAEYJGgEAAAAAAAAAAKMEjQAAAAAAAAAAgFGCRgAAAAAAAAAAwChBIwAAAAAAAAAAYJSgEQAAAAAAAAAAMErQCAAAAAAAAAAAGCVoBAAAAAAAAAAAjBI0AgAAAAAAAAAARgkaAQAAAAAAAAAAowSNAAAAAAAAAACAUYJGAAAAAAAAAADAKEEjAAAAAAAAAABglKARAAAAAAAAAAAwStAIAAAAAAAAAAAYJWgEAAAAAAAAAACMEjQCAAAAAAAAAABGCRoBAAAAAAAAAACjBI0AAAAAAAAAAIBRgkYAAAAAAAAAAMAoQSMAAAAAAAAAAGCUoBEAAAAAAAAAADBK0AgAAAAAAAAAABglaAQAAAAAAAAAAIwSNAIAAAAAAAAAAEYJGgEAAAAAAAAAAKMEjQAAAAAAAAAAgFGCRgAAAAAAAAAAwChBIwAAAAAAAAAAYJSgEQAAAAAAAAAAMErQCAAAAAAAAAAAGLV11g0AAAAAAAAAALA5fev3/3bWLdxi3fHXj9jwM33RCAAAAAAAAAAAGCVoBAAAAAAAAAAAjBI0AgAAAAAAAAAARgkaAQAAAAAAAAAAowSNAAAAAAAAAACAUYJGAAAAAAAAAADAKEEjAAAAAAAAAABglKARAAAAAAAAAAAwStAIAAAAAAAAAAAYJWgEAAAAAAAAAACMEjQCAAAAAAAAAABGCRoBAAAAAAAAAACjBI0AAAAAAAAAAIBRgkYAAAAAAAAAAMAoQSMAAAAAAAAAAGCUoBEAAAAAAAAAADBK0AgAAAAAAAAAABglaAQAAAAAAAAAAIwSNAIAAAAAAAAAAEYJGgEAAAAAAAAAAKMEjQAAAAAAAAAAgFGCRgAAAAAAAAAAwChBIwAAAAAAAAAAYJSgEQAAAAAAAAAAMErQCAAAAAAAAAAAGCVoBAAAAAAAAAAAjBI0AgAAAAAAAAAARgkaAQAAAAAAAAAAowSNAAAAAAAAAACAUYJGAAAAAAAAAADAKEEjAAAAAAAAAABglKARAAAAAAAAAAAwStAoSVXtV1UnVtUFVbWtqi6tqnOq6viq2m2Ve2+pqvtV1VFV9eZh32uqqofnUSvYc/eq+rWq+vuq+tqw3zer6jNV9caqesxqegYAAAAAAAAAgLm2zrqBWauqxyc5LcmeU3/eLckhw/Ocqjqyuy9a4RHPTHLqqpqcUlWHJzklyX5zfrrj8ByU5BFJPrxWZwIAAAAAAAAAwKYOGlXVQUnemWTXJFck+R9JPjKsn57kF5PcN8kHquqQ7r58JcdMza9Lcm6SnZI8YAX9/mSS9yfZJcllSf4wyUeTfCuTcNQBSR6X5E4r6BMAAAAAAAAAABa0qYNGSU7KJFR0fZLHdPdZU7+dXlVfSPKaTMJGxyV5xQrOOC/J85Kck+Rz3X11Vb0iywwaVdUdkvx5JiGjzyU5oru/OafsE0neVlW3XkGfAAAAAAAAAACwoC2zbmBWqurBmVwxliQnzwkZbXdikvOH+bFVtdNyz+nus7v7jd39ye6+eoXtJpOvLd0+yZVJnjRPyGj6zGtXcQ4AAAAAAAAAANzEpg0aJXnS1PyU+Qq6+4Yk7xiWt01y+Pq2NL+qul2SZwzL07r7y7PoAwAAAAAAAACAzWszB40ePozbknx6kbozpuYPW792FvW4TK54S5K/3v7Hqtqtqu5dVftUVc2mNQAAAAAAAAAANoPNHDQ6YBgv6u7rF6m7YJ53NtpDpubnVtWDqurDSS5P8oUkX0/yzar6/aq600w6BAAAAAAAAABgh7Z11g3MQlXtkmTvYfnVxWq7+7tVtS3J7knutt69LeB+U/PDk7wtN/23u0OSX0vylKo6orv/abmHVNW+IyX7LHdPAAAAAAAAAAB2DJsyaJRkj6n5FUuo3x40us36tDNqr6n5HybpJP8tyTuSfDPJvZMcn+SoTMJA762qH+3u/1jmORevvlUAAAAAAAAAAHZEm/XqtF2m5tcuof6aYdx1HXpZit2n5rskeXZ3v7q7L+7ua7v7vO4+OskfDTV3T/IrG90kAAAAAAAAAAA7rs0aNLp6an7rJdTvPIxXrUMvSzHd7//t7j9ZoO43c2Mo6mkrOOduI8+DVrAnAAAAAAAAAAA7gM16ddrlU/OlXIe2/YtCS7lmbT1M9/vhhYq6+5Kq+lSShyX50aq6dXcv5YtN29//6mK/V9VStwIAAAAAAAAAYAezKb9o1N1XJ7lkWO67WG1V3S43Bo0uXs++FnHxAvPFarck2Wt92gEAAAAAAAAAYLPZlEGjwXnDeO+qWuzLTvtPzc9fx34W8y9T81uN1E7/fv069AIAAAAAAAAAwCa0mYNGZw7j7kkOXqTusKn5J9avnUV9bGp+z5Haew3j1UkuXZ92AAAAAAAAAADYbDZz0Oi9U/Oj5yuoqi1JnjUsL0vykfVtaUEfS/LtYf74qpr3q0ZVdY8kDxyWn+juGzagNwAAAAAAAAAANoFNGzTq7rOTfHxYPruqDp2n7LgkBwzzk7r7uukfq+pRVdXDc+o69vr9JL87LPdL8vK5NcP1b2/Ojf+mf7he/QAAAAAAAAAAsPlsnXUDM3ZsJteh7Zrkw1V1QiZfLdo1ydOTHDPUXZjkxJUeUlVHzfnTA6fmR1TV3afWF3X3mbmpNyR5WpIfS/LbVfUjSd6e5FuZXJf2giTbw1IfTPLulfYLAAAAAAAAAABzbeqgUXd/tqqeluS0JHsmOWGesguTHNndl6/iqFMW+e3Fc9ZvT3KToFF3X11Vj0vy/iQHZxKEevo8+30wydO7u1fYKwAAAAAAAAAA3MSmvTptu+5+f5IDk7wuk1DRlUkuS/KpTEJAB3X3RTNrcEp3fz3JQ5L8cpIzknw7yXVJvpHkr5P8dHevNhQFAAAAAAAAAAA3sam/aLRdd385yQuHZznvfTRJLaFutGYZZ16f5C3DAwAAAAAAAAAAG2LTf9EIAAAAAAAAAAAYJ2gEAAAAAAAAAACMEjQCAAAAAAAAAABGCRoBAAAAAAAAAACjBI0AAAAAAAAAAIBRgkYAAAAAAAAAAMAoQSMAAAAAAAAAAGCUoBEAAAAAAAAAADBK0AgAAAAAAAAAABglaAQAAAA
"text/plain": [
"<Figure size 2800x1200 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.figure(figsize=(14,6),dpi=200)\n",
"sns.barplot(data=imp_feats.sort_values('Importance'),x=imp_feats.sort_values('Importance').index,y='Importance')\n",
"\n",
"plt.xticks(rotation=90);"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:xlabel='habitat', ylabel='count'>"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYsAAAEGCAYAAACUzrmNAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAYQUlEQVR4nO3dfbRddX3n8ffHGA2tMCpERUJM7CASHhIml6BF1IpVZLT4DPgAFMegyKjTKa0PbWEC2mHUcWlBWGGkGImISBnTLhxKYaaIA0qiARIjQ3i+WRRo6AgdhEL4zh9nBw/xJvvecM859+a+X2uddff57d/+3S8PK5/svX/7t1NVSJK0Lc8adAGSpInPsJAktTIsJEmtDAtJUivDQpLU6tmDLqBXdtttt5ozZ86gy5CkSWPVqlX/WFUzR9q3w4bFnDlzWLly5aDLkKRJI8ldW9vnZShJUivDQpLUyrCQJLXaYe9ZSNIgPP744wwPD/Poo48OupStmjFjBrNmzWL69OmjPsawkKRxNDw8zM4778ycOXNIMuhyfk1VsXHjRoaHh5k7d+6oj/MylCSNo0cffZRdd911QgYFQBJ23XXXMZ/5GBaSNM4malBstj31GRaSpFaGhSRNAKeddhpf/OIXB13GVnmDW+qzu5fsP25jzf6zm8dtLGlbPLOQpAFYtmwZBxxwAPPnz+eDH/zg0/add955HHTQQcyfP593vetdPPLIIwBccskl7LfffsyfP5/Xvva1AKxdu5ZFixaxYMECDjjgAG699dae1GtYSFKfrV27ljPOOIOrr76aG2+8ka985StP2//Od76TG264gRtvvJF99tmHr3/96wAsWbKEK664ghtvvJEVK1YAcO655/KJT3yC1atXs3LlSmbNmtWTmg0LSeqzq6++mve85z3stttuALzwhS982v41a9Zw6KGHsv/++7N8+XLWrl0LwCGHHMLxxx/Peeedx6ZNmwB49atfzec//3nOPPNM7rrrLnbaaaee1GxYSNIEc/zxx3PWWWdx8803c+qppz71TMS5557LGWecwT333MPChQvZuHEj73vf+1ixYgU77bQTRxxxBFdffXVPajIsJKnP3vCGN3DJJZewceNGAB588MGn7X/44YfZfffdefzxx1m+fPlT7bfddhsHH3wwS5YsYebMmdxzzz3cfvvtvPzlL+fjH/84Rx55JDfddFNPanY2lCT12b777stnP/tZXve61zFt2jQOPPBAul/Wdvrpp3PwwQczc+ZMDj74YB5++GEATjnlFG699VaqisMOO4z58+dz5pln8s1vfpPp06fzkpe8hM985jM9qTlV1ZOBB21oaKh8+ZEmIqfO7tjWrVvHPvvsM+gyWo1UZ5JVVTU0Un8vQ0mSWhkWkqRWhoUkqVXPwiLJ+UnuT7Kmq+3iJKubz51JVjftc5L8smvfuV3HLExyc5L1Sb6aib6coyTtgHo5G+oC4Cxg2eaGqjpq83aSLwG/6Op/W1UtGGGcc4APAz8CLgcOB74//uVKkramZ2cWVXUN8OBI+5qzg/cCF21rjCS7A7tU1fXVmba1DHj7OJcqSWoxqOcsDgXuq6ruFa/mJvkp8BDwJ1X1A2APYLirz3DTNqIki4HFALNnzx73oiVprBaesqy90xis+sKx4zreaA3qBvcxPP2s4l5gdlUdCPwB8K0ku4x10KpaWlVDVTU0c+bMcSpVktT3sEjybOCdwMWb26rqsara2GyvAm4DXgFsALqXUJzVtEmStuLOO+/kla98Je9///vZZ599ePe73/3UMufbaxBnFm8Efl5VT11eSjIzybRm++XAXsDtVXUv8FCSVzX3OY4FvjeAmiVpUrnllls46aSTWLduHbvssgtf+9rXntF4vZw6exFwHbB3kuEkH2p2Hc2v39h+LXBTM5X2u8BHqmrzzfGTgP8GrKdzxuFMKElqseeee3LIIYcA8IEPfIBrr732GY3XsxvcVXXMVtqPH6HtUuDSrfRfCew3rsVJ0g5uy0fSnukjaj7BLUk7oLvvvpvrrrsOgG9961u85jWveUbjuUS5JPXQoKa67r333px99tmccMIJzJs3j49+9KPPaDzDQpJ2QM9+9rO58MILx208L0NJkloZFpK0g5kzZw5r1qxp7zgGhoUkqZVhIUlqZVhIkloZFpKkVk6dlaQeunvJ/uM63uw/u3lcxxstzywkSa0MC0naAV144YUsWrSIBQsWcOKJJ7Jp06ZnNJ5hIUk7mHXr1nHxxRfzwx/+kNWrVzNt2jSWL1/+jMb0noUk7WCuuuoqVq1axUEHHQTAL3/5S170ohc9ozENC0nawVQVxx13HH/+538+bmN6GUqSdjCHHXYY3/3ud7n//vsBePDBB7nrrrue0ZieWUhSDw1iquu8efM444wzeNOb3sSTTz7J9OnTOfvss3nZy1623WMaFpK0AzrqqKM46qijxm28Xr6D+/wk9ydZ09V2WpINSVY3nyO69n06yfoktyR5c1f74U3b+iSf6lW9kqSt6+U9iwuAw0do/3JVLWg+lwMkmQccDezbHPO1JNOSTAPOBt4CzAOOafpKkvqoZ2FRVdcAD46y+5HAt6vqsaq6A1gPLGo+66vq9qr6F+DbTV9JmrCqatAlbNP21DeI2VAnJ7mpuUz1gqZtD+Cerj7DTdvW2iVpQpoxYwYbN26csIFRVWzcuJEZM2aM6bh+3+A+BzgdqObnl4ATxmvwJIuBxQCzZ88er2EladRmzZrF8PAwDzzwwKBL2aoZM2Ywa9asMR3T17Coqvs2byc5D/ib5usGYM+urrOaNrbRPtL4S4GlAENDQxMz1iXt0KZPn87cuXMHXca46+tlqCS7d319B7B5ptQK4Ogkz00yF9gL+DFwA7BXkrlJnkPnJviKftYsSerhmUWSi4DXA7slGQZOBV6fZAGdy1B3AicCVNXaJN8BfgY8AXysqjY145wMXAFMA86vqrW9qlmSNLKehUVVHTNC89e30f9zwOdGaL8cuHwcS5MkjZFrQ0mSWhkWkqRWhoUkqZVhIUlqZVhIklq5RPkUdPeS/cdtrEGs1S+p/zyzkCS1MiwkSa0MC0lSK8NCktTKsJAktTIsJEmtDAtJUivDQpLUyrCQJLUyLCRJrQwLSVIrw0KS1MqwkCS16llYJDk/yf1J1nS1fSHJz5PclOSyJM9v2uck+WWS1c3n3K5jFia5Ocn6JF9Nkl7VLEkaWS+XKL8AOAtY1tV2JfDpqnoiyZnAp4E/bvbdVlULRhjnHODDwI+Ay4HDge/3qGZpRAtPWdbeaZQu23nchpL6pmdnFlV1DfDgFm1/W1VPNF+vB2Zta4wkuwO7VNX1VVV0guftPShXkrQNg7xncQJPP0OYm+SnSf4+yaFN2x7AcFef4aZtREkWJ1mZZOUDDzww/hVL0hQ1kLBI8lngCWB503QvMLuqDgT+APhWkl3GOm5VLa2qoaoamjlz5vgVLElTXN9fq5rkeOCtwGHNpSWq6jHgsWZ7VZLbgFcAG3j6papZTZskqY/6emaR5HDgj4Dfq6pHutpnJpnWbL8c2Au4varuBR5K8qpmFtSxwPf6WbMkqYdnFkkuAl4P7JZkGDiVzuyn5wJXNjNgr6+qjwCvBZYkeRx4EvhIVW2+OX4SnZlVO9G5x+FMKEnqs56FRVUdM0Lz17fS91Lg0q3sWwnsN46lSZLGyCe4JUmtDAtJUivDQpLUyrCQJLUyLCRJrQwLSVIrw0KS1MqwkCS1MiwkSa0MC0lSK8NCktTKsJAktRpVWCS5ajRtkqQd0zZXnU0yA/gNOsuMvwBIs2sXtvF6U0nSjqVtifITgU8CLwVW8auweAg4q3dlSZImkm2GRVV9BfhKkn9fVX/Rp5okSRPMqF5+VFV/keS3gTndx1TVsh7VJUmaQEYVFkm+CfwWsBrY1DQXYFhI0hQw2teqDgHzqqrGMniS84G3AvdX1X5N2wuBi+mcpdwJvLeq/imdl3J/BTgCeAQ4vqp+0hxzHPAnzbBnVNU3xlKHJOmZGe1zFmuAl2zH+BcAh2/R9ingqqraC7iq+Q7wFmCv5rMYOAe
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.countplot(data=df,x='habitat',hue='class')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Interesting to see how the importance of the features shift as more are allowed to be added in! But remember these are all weak learner stumps, and feature importance is available for all the tree methods!"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 1
}