You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

2800 lines
522 KiB

2 years ago
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"___\n",
"\n",
"<a href='http://www.pieriandata.com'><img src='../Pierian_Data_Logo.png'/></a>\n",
"___\n",
"<center><em>Copyright by Pierian Data Inc.</em></center>\n",
"<center><em>For more information, visit us at <a href='http://www.pieriandata.com'>www.pieriandata.com</a></em></center>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Linear Regression with SciKit-Learn\n",
"\n",
"We saw how to create a very simple best fit line, but now let's greatly expand our toolkit to start thinking about the considerations of overfitting, underfitting, model evaluation, as well as multiple features!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Imports"
]
},
{
"cell_type": "code",
"execution_count": 188,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Sample Data\n",
"\n",
"This sample data is from ISLR. It displays sales (in thousands of units) for a particular product as a function of advertising budgets (in thousands of dollars) for TV, radio, and newspaper media."
]
},
{
"cell_type": "code",
"execution_count": 189,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv(\"Advertising.csv\")"
]
},
{
"cell_type": "code",
"execution_count": 190,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>TV</th>\n",
" <th>radio</th>\n",
" <th>newspaper</th>\n",
" <th>sales</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>230.1</td>\n",
" <td>37.8</td>\n",
" <td>69.2</td>\n",
" <td>22.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>44.5</td>\n",
" <td>39.3</td>\n",
" <td>45.1</td>\n",
" <td>10.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>17.2</td>\n",
" <td>45.9</td>\n",
" <td>69.3</td>\n",
" <td>9.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>151.5</td>\n",
" <td>41.3</td>\n",
" <td>58.5</td>\n",
" <td>18.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>180.8</td>\n",
" <td>10.8</td>\n",
" <td>58.4</td>\n",
" <td>12.9</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" TV radio newspaper sales\n",
"0 230.1 37.8 69.2 22.1\n",
"1 44.5 39.3 45.1 10.4\n",
"2 17.2 45.9 69.3 9.3\n",
"3 151.5 41.3 58.5 18.5\n",
"4 180.8 10.8 58.4 12.9"
]
},
"execution_count": 190,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Expanding the Questions\n",
"\n",
"Previously, we explored **Is there a relationship between *total* advertising spend and *sales*?** as well as predicting the total sales for some value of total spend. Now we want to expand this to **What is the relationship between each advertising channel (TV,Radio,Newspaper) and sales?**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Multiple Features (N-Dimensional)"
]
},
{
"cell_type": "code",
"execution_count": 191,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABHgAAAGoCAYAAAA99FLLAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAABxzklEQVR4nO3df7xlV13f//dnbg5wJtjcICklV0LGX5MSRzNyLdixSqIyIgJjoCBf/NXa0vbrL1J6v50oldBq59ZR8evD1oqKQUU6JJleI2MdKROxpgW/M94ZhmhGEQhwEkiUXJDkQu7cWd8/zjkz55679z77915r79fz8cgjc/f5tfaP9Vlrr71+mHNOAAAAAAAACNeOphMAAAAAAACAYmjgAQAAAAAACBwNPAAAAAAAAIGjgQcAAAAAACBwNPAAAAAAAAAEjgYeAAAAAACAwNHAAwTCzJyZfXnT6QCAEJnZtaM4etno7/9hZt/XdLqKMLOPmtm3NJ0OAID/uJfoBhp40Cpm9rmJ/y6Y2frE368ZVYZt6jOXmdnDZvYdEd/3JDP7WTP7xOg7PmpmP1/bDgFAh41i7jiOf9LMbjezp5bx3c65Fznn3pYzXT9mZh8ZpesTZnakjDQBQIhGsfphM7t8Yts/M7M/bDBZXuBeAnWjgQet4px76vg/SR+T9JKJv++SNC/pm6Y+9m2SnKTfj/jKWyUtSvoHkr5I0gsk/Wk1qQcARHjJKIbfIGmvhnG5MaNeP98j6VtG6VqU9J4m0wQAHpiT9KNNJ6JJ4x6iU7iXQK1o4EFnOOc+L+mdkr536qXvlfTbzrnzER/7Okn/3Tn3oBv6qHPuN8YvjlrhbzWzPzOzR83s183sKROvf4eZnTazNTP732b21VOf/Tdm9gEz+4yZHZn67JKZPWRmD5rZPy3rOABAiJxzn5R0XMOGHkmSmR00s78ys78dxeHvnHhtzsx+xsz+2sw+LOnFk99nZn9oZv9s9O8dZvYGM3tg9BT6N8zsipikfJ2k4865vxqnyzn3lqnvPWRmf2JmnzWz3zGzp028/vxRebBmZmfM7AVTn/0PZnbvaJ/+wMyePvH694zS+Ddm9uN5jiMAVOSwpH9jZvNRL5rZdWb2bjP7tJmdM7NXjrbvGsXDHaO/f8XMHp743G+a2etG//5+M/vwKD5+xMxeM7H9XjP7xVGd+n4z++aJ7/gnZvbno8992Mz+xcRrLxj1rvmxUXnx0fH3jl5/8qgs+ZiZfcrM/quZ9ac++2/N7JOSfj1i17mXQK1o4EHXvE3SKyYC8xWSXjLaHuV9kv61mf3fZrbHbOvwrpHXSNov6cskfaWkN4y+e6+kt0r6F5K+WNIvS7rbzJ488dlXatiDaJekr5b0/aPPfpukfyPpWyV9hSTmWADQaWb2JZJeJOlDE5v/StI/knSFpDdJ+i0ze+botX8u6Ts07PWzKOkVCV///aP/bpT0pZKeKukXY977PknfO6o4L5rZXMR7vlfSP5X0TEnnJf3CaB8WJB2T9JOSnqZhnL/LzK6a+Oz/JemfSPq7kp40eo/M7DmSfknD3kNXa1iufEnCPgFAnU5K+kONYtYkGw7derek39Ywtn2XpP9iZs9xzn1E0mc1jNWS9I2SPmdmf3/09zdJeu/oO35B0oucc18k6R9KOj3xM8/TsEx4uqQ3Sjo60bj+sIblwd/RML6+2cy+duKzf2/0uQVJ3yfpLWa2e/Tasob1+xskffnoPT8x9dmnSXq2pNdGHBfuJVArGnjQKc65eyV9StL4Ke8rJf2Fc+50zEcOSfpPGgbek5IGtn1Szl90zn3cOfdpST8l6dWj7a+V9MvOufc75zZHcz18QdLzJz77C6MW/U9L+l1dejL9Skm/7pz7oHPuMUm35dphAAjfipn9raSPa1hJf+P4BefcHaMYesE5d0TSX2rYDV4axtGfn4jPhxJ+4zWSfs4592Hn3Oc07FL/XRbR3d4591uSfljDyvh7JT1sZv926m2/ORG//52kV44agr5b0u85535vlOZ3a1i2fPvEZ3/dOfcXzrl1DXud3jDa/gpJ73LO/ZFz7guj772QsE8AULefkPTDU43W0rBx5aPOuV93zp13zq1qOHXCPx69/l5J32Rmf2/0952jv3dp2ChzZrT9gqSvMrO+c+4h59x9E7/xsIYxf2NUHpzTqOemc+6Yc+6vRj1o3ivpDzR8ODDp3znnvjB6/ZiGcds0rM/f4pz7tHPubyX9Rw0bqMYuSHrj6LPrEceEewnUigYedNFv6NIwre8Z/R1pFEz/s3Nun4bz9/yUpLdOPFWQhjcdYw9o+GRVGrbkv37UpXLNzNYkPWvidUn65MS/H9fwqbFG75n+XgDoogOjp7UvkHSdhk9ZJUlm9r0TXdfXJH3VxOtZ4ujVU68/IOkySc+IerNz7u3OuW/RsFz4l5L+g5ntn3jL9O/2Rul6tqR/PFUufIOGPX3GUpULowr73yTsEwDUyjn3QUnvknRw6qVnS3reVOx7jYa9X6RhA88LNOy980ca9gT6ptF//2vUIP6YpFdpGHMfMrNjZnbdxG8MnHNu4u+LdXIze5GZvc+Gw8PWNGxUf/rEex8dff/0Z6+StFPSqYl0//5o+9gjo2kg4o4J9xKoFQ086KLflPTNZvb1GraAvz3Nh5xz6865/yzpUUnPmXjpWRP/vkbSg6N/f1zSTznn5if+2+mce0eKn3so4nsBoLNGT1Vvl/QzkmRmz5b0K5J+SNIXO+fmJX1Q0rj7e5Y4+qCGFenJ957XsMdnUpo2nHN3SPqAho1LY9O/uyHprzUsF35zqly43Dm3nPQ7I1v2x8x2athlHwB88kYNh8guTGz7uKT3TsW+pzrn/tXo9fdq2KPmBaN//7GkfRoNzxp/iXPuuHPuWzVsFL9fwzJgbGFq+NM1kh4cDWe6S8Oy4xmjsuL3dKmskKQrbWIFMF2qz/+1pHVJ10+k+wo3nGD/YrLSHhjuJVAHGnjQOc65j2pYcLxD0rvdcOLOSGb2utEEan0bLqf+fRrOgL868bYfNLMvGY3z/XFJ4+Vyf0XSvzSz59nQ5Wb2YjP7ohTJfKek7zez54wq8W+c9QEA6ICfl/StZvY1ki7XsGL9iDScRFNbG1neKelHRvH5Sm1/ojzpHZJuseFkn0/VsAv+ERcx+b4NJ/N8sZl9kQ0nZ36RpOslvX/ibd89Eb//vaQ7nXObkn5L0kvMbL8NJ4F+yqiMSTOXzp2SvsPMvsHMnjT6XupxALzinPuQhnXhH5nY/C5JX2nDieJ7o/++btyLxTn3lxo2pHy3hg1Bn9Wwgf3lGjXwmNkzzOxlo4aYL0j6nLYOU/27Gsb8npn9Y0l/X8OGnCdJerKGZcX5Ucx+YUTS32TDJc3/kYZDyu5wzl3QsD7/ZjP7u6N0LEz12EzEvQTqRsUAXfU2DZ/Wxg7PGnlc0s9q2P3xryX9oKSXO+c+PPGe39ZwLO+HNZzc7SclyTl3UsMnGL+oYUv9hzSa+GwW59z/0PBG5sTocyfSfA4A2sw594iGcfsnnHN/pmF8/j8a3gjskXTvxNt/RcNVt85ouCTt0YSvfquGvTv/SNJHJH1ew3l2onxW0o9J+pikNUk/LelfOef+eOI9v6lhb6NPSnqKRjc6zrmPS3rZ6POPaPh0dkkp6mOjuSZ+UMMy5yENy5VPzPocADTg32vYCC9JGs1d80IN5655UMPY+J80bHgZe6+kvxnFyfHfpktLiu+Q9K9Hn/+0hr17/tXE59+v4WTCf63hMKhXOOf+ZvTbP6Jhg8ejGk5kf/dUej85eu1BDXv2/0vn3P2j1/6thnXx95nZZyX9T0m7lR73EqiVbR2qCCALM/uopH/mnPufTacFANA8M/tDSb/lnPvVptMCAF1gZt+vYX38G3J89gUaxuxGViXkXgJlowcPAAAAAABA4GjgAQAAAAAACBxDtAAAAAAAAAJHDx4AAAAAAIDAXdZ0AtJ4+tOf7q699tqmkwEAwTl16tRfO+euajodaRHvASCf0OK9RMwHgLziYn4QDTzXXnutTp482XQyACA4ZvZA02nIgng
"text/plain": [
"<Figure size 1152x432 with 3 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"fig,axes = plt.subplots(nrows=1,ncols=3,figsize=(16,6))\n",
"\n",
"axes[0].plot(df['TV'],df['sales'],'o')\n",
"axes[0].set_ylabel(\"Sales\")\n",
"axes[0].set_title(\"TV Spend\")\n",
"\n",
"axes[1].plot(df['radio'],df['sales'],'o')\n",
"axes[1].set_title(\"Radio Spend\")\n",
"axes[1].set_ylabel(\"Sales\")\n",
"\n",
"axes[2].plot(df['newspaper'],df['sales'],'o')\n",
"axes[2].set_title(\"Newspaper Spend\");\n",
"axes[2].set_ylabel(\"Sales\")\n",
"plt.tight_layout();"
]
},
{
"cell_type": "code",
"execution_count": 192,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<seaborn.axisgrid.PairGrid at 0x216014fb648>"
]
},
"execution_count": 192,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAsUAAALFCAYAAAAry54YAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAEAAElEQVR4nOyde3xU5Z3/P89MMpnM5DaZJJOQMBNCwp0AMWKwhFZiXaq4KqhYW2wtNtv9KUlru6V1td2u1i1bS5eo3RalXmitYPFShaVa0IoV0HC/BEgISUjIdXKdSSaTzDm/PybnMJdzZs7cZ5Ln/XrlRZicOeeZeb7P93zP9/leCMuyoFAoFAqFQqFQpjKySA+AQqFQKBQKhUKJNNQoplAoFAqFQqFMeahRTKFQKBQKhUKZ8lCjmEKhUCgUCoUy5aFGMYVCoVAoFAplyhPTRvGqVatYAPSH/oTyx2+ofNKfEP8EBJVP+hPiH7+hskl/wvAjSEwbxT09PZEeAoUiCpVPSjRD5ZMSrVDZpESKmDaKKRQKhUKhUCiUYBAyo5gQoiSEfEYIOUkIOUsI+dnE6zMIIUcIIQ2EkJ2EEMXE6wkT/2+Y+Ht+qMZGoVAoFAqFQqE4EkpP8SiAlSzLLgKwGMAqQkgZgM0Afs2ybCGAPgAbJo7fAKBv4vVfTxxHoVAoPsEwLBq7TTh0qQeN3SYwjGj4GIUSs1A5p0QjsS6XcaE6MWvvH22a+G/8xA8LYCWA+ydefwXAfwD4XwB3TPwOAH8G8BwhhLC0DzWFQpEIw7DYd7YDj+46AcsYA2W8DFvuXYxV87Mhk5FID49CCQpUzinRyGSQy5DGFBNC5ISQEwC6AHwA4BKAfpZlxycOaQWQO/F7LoArADDx9wEAWoFzVhJCagkhtd3d3aEcPoXiM1Q+gwPDsLjUZcKB85040mhEU480j0OT0cwrZACwjDF4dNcJNBnNoR5yTBAJ+Yw2z1G0jccfJqOcU90ZnTiul0tdJjT1iK+dySCXIfMUAwDLsjYAiwkhaQDeAjAnCOfcBmAbAJSWlsaeNpsEjNsYNHSbkJOaiNTE+EgPJ6qIBflkGBZNRjM6By3QpSiRr1VH1VO8kLehuqIIRbokrJyt8zjWzkELr5A5LGMMuoYsKMhMCvXQo55wy6eY5+iWuTq09A2HXQYngycLmJxyHgu6M5JEQm+L6eJXDzWjb9jqtnYmg1yGpfoEy7L9AD4EsAxAGiGEM8bzALRN/N4GYDoATPw9FYAxHOOjSKe2qRfL//tDbHi5Fst/cQCvfHo50kOi+ACn5G6tOYivvnAEt9YcxL6zHVHlLRPyNmzdX49TrQNePQ66FCWU8c5qTRkvQ1ay0u3YyeAxjHbEPEefNhojIoPePFmxIhO+yDkl9omU3hbTxWtK8gS9wJ7kMlbWViirT2ROeIhBCEkE8GUAdbAbx3dPHPYNAO9M/P6Xif9j4u8HaDxxdHGmbQAPvVqL9WUGPHPPIjx55wL89u+N+PPRK5EeGkUisbC9JeZtYFiga8ji8b35WjW23LuYV8ycJzBfq3Y6jrvJPPjyZ/ikwYi3T7ThHw09GB9nhE5L8ROxuaxt7o2IDHryZAXT8Ai1ASBVzimTg0jpbbH1Qsi13x11sphc6jUqp7Vl17s9UWkghzJ8IgfAK4QQOezG9y6WZd8jhJwD8Doh5CkAxwFsnzh+O4AdhJAGAL0A7gvh2Cg+Yhmz4eHXjmF9mQEleg0A+1Phd2+ehafeq8Pywkxkp1IvRbTjz/aW0LYdgJBt5XHeBsdxKuNlkBF49YTJZASr5mdjTlU5uoYsyEoWHluT0YzN++qwrlSPmgP1/Nbg5rXFuL14WkxtpUcz3FxqVAqsKckDIUCSQo4ZmUl4ZGUhAGD30Va0D1jCssUqJltZyUpRw2NOVblP4wpHiIZUOadMDqTobVc9rdeoAg5RElsvnLvSdXdCTC4d11ZOqhLrSvWo3FEblSFMoaw+cQrAEoHXGwEsFXjdAuCeUI2HEhi//egSclKVuHFmhtPr+nQVvjQ7E796/wJ+ec+iCI2OIhVPRoEQQjf45+5fAus4G7KbPudtEIopluIJk8kICjKTPBoynYMW3He9HiNjNjxUXgDAbpxt2n0KC3NTYyb+LdrJ16rx3P1LUN9pwtb99U7zuftoK/qGrahaWYSdtS1h2foXki3Ow3rkstGveEhXY0RGEBTj2htS5JwyOfCmt131tEGbiI0ri/D422cC0tFiuvjVQ82iuxNCctk5aOEfjPWaRFwdGIFGpUD7gCVk68NfQppoR5kcGE2j+P0/LuOpOxcI/v224mn4/q4T+O6XZyE3LTHMo6P4giejQIjLPe7es1OtA9j2cWPIbvqct2H2xnK09JqhUsRBl5IAfXpgnjBH4yVZGYeUxHgnQ61qZRF2HG6OqaSQaEcmI5ihTcIjrx13i0vcsLwAz3/YgJoD9di2vtTjA48/SUZi7xHzsIoZHonxchy61CN4XaGHxqfvWsjf8DliLdmIEl1409uuuxyri3N5gxjwX0e7rpfMJCXkMmCJPs3r7oSrvn1gmUFQ33KGcbSsD2oUU7zywsFGLCvQIlPEk5OUEIfyoky8+mkTfnzr3DCPjuILvmy7MgyLuvZBN+8Zw0LQo9Y5GDylJpMRzMxKwsys4JzP1XjZtGo2r6AB+/hrDtSjckUBTVYKMl1D3uMS4+XE483V13AEb+8R8rAKGR5P3bkAVa8fR7NxRPC6QiEXj711GpUrClCzv4E/N02CowSCN73tGl5BSPB0tNB6yc/wfA6p+pZ7MI6m9UGNYopHTKPjeO1IC568Q9hLzFExNwtPvXcOP/in2YiXh6WoCcUHhLxm3pRjk9GM+q4hN++ZnEDQo6ZSyEM2/kBxNV7MVpvgTWOWLlnUYxntpeyiFSlxiboU8RuiazzimpI8nO8YRG5aIhbmpgrOgZCxunlfHXLTlBi22gTnz9XwSIyX8wYxdw5Xb5tYrOcsXTL/mWkSHMUfxHS2kN5WKeIE11ikdLRUfUsm7iWb1xbDaB4FgIjrVWoUUzzy59ormJ+biiwPNy0AyElNRHZqIj4834Vb5meHaXQUKfib+NM5aMGu2lZUrSxySkYrzErC926ehV//7aLTVtiYLXorNwgZL0I3jbnZKaJe88lQ3zYS+BOX6Ag3dzmpSqwvM/CyuO3jRtE5cJ1vLrln3bbDHufP0St26FIPbxBzuG7zihn8c7NTsJcmwVH8xFd9Y7XZnPT0uyfb8NPV8/Gz985GREdL1bflhRlYmp+OJ945LbobE26oUUwRhWVZvPJpM9YvM0g6ftlMLd4+0UaN4ijD36x6XYoSfcNW7DjcjA3LC0AIICPAjAw1nnn/Av8aywI7a1uwakHw5z1Y3llX42X30VZUVxQ5xbhtuXcxZmQIG2fBqkwwFQkkLhG4NndrSvL4mz7geQ5c59uX94qdIydViXtK8zBstaGx24R8rVo01nNGhpo3sCkUX/FV32jVCdhZ28Lr5Nm6ZLzyaWNQdLQ/OliqvtWlJGDV1oNRpVepUUwR5cjlXthYBnOykyUdvzQ/Hd9/4yRGrDYkRvFW+lTD3y5Djjd8Lu5ry72LMUeXgk2r5kpO1vOXYHpnXY2XvmErinRJ2LOxHN0m7968ydCpKZL4E5fIHzcxd+c73OPbxebAdb7lMuEYS0/z53gOjUrhlijEySItjUYJNr7qm3yt2kkni1Wf8FVH+6uDpepbfyu+hBJqFFNEee1IC740OwuESFPwKYnxKMhU42B9N/UWRxG+lmHj8JTcEQ5DIJjeWU9jlpLM5+93SAkcbu5y0xKdqp4A4nPgHh8cJ/m9QufoHhrFN176TFQWaWk0SjDxVd8I6Te9RoUSvSYgHe2vDpaqb6NRr9KMKIogg5YxHDjfhS8UZng/2IHF09Pw/tnOEI2K4g+BdL/iPHxlBRkoyEzilarY68HEk7fEHzyN2VsHMtpBLHAC6fImkxEszE31aQ4c59vX97qeg2HZoMoiheIJf/SNq36Li5MFrKMD0cGu4wHgtv6jUa9STzFFkD2n2rEgNwUpynif3le
"text/plain": [
"<Figure size 720x720 with 20 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Relationships between features\n",
"sns.pairplot(df,diag_kind='kde')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introducing SciKit Learn\n",
"\n",
"We will work a lot with the scitkit learn library, so get comfortable with its model estimator syntax, as well as exploring its incredibly useful documentation!\n",
"\n",
"---"
]
},
{
"cell_type": "code",
"execution_count": 193,
"metadata": {},
"outputs": [],
"source": [
"X = df.drop('sales',axis=1)\n",
"y = df['sales']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train | Test Split\n",
"\n",
"Make sure you have watched the Machine Learning Overview videos on Supervised Learning to understand why we do this step"
]
},
{
"cell_type": "code",
"execution_count": 194,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.model_selection import train_test_split"
]
},
{
"cell_type": "code",
"execution_count": 195,
"metadata": {},
"outputs": [],
"source": [
"# random_state: \n",
"# https://stackoverflow.com/questions/28064634/random-state-pseudo-random-number-in-scikit-learn\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)"
]
},
{
"cell_type": "code",
"execution_count": 196,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>TV</th>\n",
" <th>radio</th>\n",
" <th>newspaper</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>85</th>\n",
" <td>193.2</td>\n",
" <td>18.4</td>\n",
" <td>65.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>183</th>\n",
" <td>287.6</td>\n",
" <td>43.0</td>\n",
" <td>71.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>127</th>\n",
" <td>80.2</td>\n",
" <td>0.0</td>\n",
" <td>9.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>53</th>\n",
" <td>182.6</td>\n",
" <td>46.2</td>\n",
" <td>58.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>100</th>\n",
" <td>222.4</td>\n",
" <td>4.3</td>\n",
" <td>49.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>63</th>\n",
" <td>102.7</td>\n",
" <td>29.6</td>\n",
" <td>8.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>70</th>\n",
" <td>199.1</td>\n",
" <td>30.6</td>\n",
" <td>38.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>81</th>\n",
" <td>239.8</td>\n",
" <td>4.1</td>\n",
" <td>36.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>214.7</td>\n",
" <td>24.0</td>\n",
" <td>4.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>95</th>\n",
" <td>163.3</td>\n",
" <td>31.6</td>\n",
" <td>52.9</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>140 rows × 3 columns</p>\n",
"</div>"
],
"text/plain": [
" TV radio newspaper\n",
"85 193.2 18.4 65.7\n",
"183 287.6 43.0 71.8\n",
"127 80.2 0.0 9.2\n",
"53 182.6 46.2 58.7\n",
"100 222.4 4.3 49.8\n",
".. ... ... ...\n",
"63 102.7 29.6 8.4\n",
"70 199.1 30.6 38.7\n",
"81 239.8 4.1 36.9\n",
"11 214.7 24.0 4.0\n",
"95 163.3 31.6 52.9\n",
"\n",
"[140 rows x 3 columns]"
]
},
"execution_count": 196,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_train"
]
},
{
"cell_type": "code",
"execution_count": 197,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"85 15.2\n",
"183 26.2\n",
"127 8.8\n",
"53 21.2\n",
"100 11.7\n",
" ... \n",
"63 14.0\n",
"70 18.3\n",
"81 12.3\n",
"11 17.4\n",
"95 16.9\n",
"Name: sales, Length: 140, dtype: float64"
]
},
"execution_count": 197,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_train"
]
},
{
"cell_type": "code",
"execution_count": 198,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>TV</th>\n",
" <th>radio</th>\n",
" <th>newspaper</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>37</th>\n",
" <td>74.7</td>\n",
" <td>49.4</td>\n",
" <td>45.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>109</th>\n",
" <td>255.4</td>\n",
" <td>26.9</td>\n",
" <td>5.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31</th>\n",
" <td>112.9</td>\n",
" <td>17.4</td>\n",
" <td>38.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>89</th>\n",
" <td>109.8</td>\n",
" <td>47.8</td>\n",
" <td>51.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>66</th>\n",
" <td>31.5</td>\n",
" <td>24.6</td>\n",
" <td>2.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>119</th>\n",
" <td>19.4</td>\n",
" <td>16.0</td>\n",
" <td>22.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>54</th>\n",
" <td>262.7</td>\n",
" <td>28.8</td>\n",
" <td>15.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>74</th>\n",
" <td>213.4</td>\n",
" <td>24.6</td>\n",
" <td>13.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>145</th>\n",
" <td>140.3</td>\n",
" <td>1.9</td>\n",
" <td>9.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>142</th>\n",
" <td>220.5</td>\n",
" <td>33.2</td>\n",
" <td>37.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>148</th>\n",
" <td>38.0</td>\n",
" <td>40.3</td>\n",
" <td>11.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>112</th>\n",
" <td>175.7</td>\n",
" <td>15.4</td>\n",
" <td>2.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>174</th>\n",
" <td>222.4</td>\n",
" <td>3.4</td>\n",
" <td>13.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>55</th>\n",
" <td>198.9</td>\n",
" <td>49.4</td>\n",
" <td>60.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>141</th>\n",
" <td>193.7</td>\n",
" <td>35.4</td>\n",
" <td>75.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>149</th>\n",
" <td>44.7</td>\n",
" <td>25.8</td>\n",
" <td>20.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>262.9</td>\n",
" <td>3.5</td>\n",
" <td>19.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>34</th>\n",
" <td>95.7</td>\n",
" <td>1.4</td>\n",
" <td>7.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>170</th>\n",
" <td>50.0</td>\n",
" <td>11.6</td>\n",
" <td>18.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>39</th>\n",
" <td>228.0</td>\n",
" <td>37.7</td>\n",
" <td>32.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>172</th>\n",
" <td>19.6</td>\n",
" <td>20.1</td>\n",
" <td>17.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>153</th>\n",
" <td>171.3</td>\n",
" <td>39.7</td>\n",
" <td>37.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>175</th>\n",
" <td>276.9</td>\n",
" <td>48.9</td>\n",
" <td>41.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>61</th>\n",
" <td>261.3</td>\n",
" <td>42.7</td>\n",
" <td>54.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>65</th>\n",
" <td>69.0</td>\n",
" <td>9.3</td>\n",
" <td>0.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50</th>\n",
" <td>199.8</td>\n",
" <td>3.1</td>\n",
" <td>34.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>42</th>\n",
" <td>293.6</td>\n",
" <td>27.7</td>\n",
" <td>1.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>129</th>\n",
" <td>59.6</td>\n",
" <td>12.0</td>\n",
" <td>43.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>179</th>\n",
" <td>165.6</td>\n",
" <td>10.0</td>\n",
" <td>17.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>17.2</td>\n",
" <td>45.9</td>\n",
" <td>69.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>23.8</td>\n",
" <td>35.1</td>\n",
" <td>65.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>133</th>\n",
" <td>219.8</td>\n",
" <td>33.5</td>\n",
" <td>45.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>90</th>\n",
" <td>134.3</td>\n",
" <td>4.9</td>\n",
" <td>9.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>13.2</td>\n",
" <td>15.9</td>\n",
" <td>49.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41</th>\n",
" <td>177.0</td>\n",
" <td>33.4</td>\n",
" <td>38.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>32</th>\n",
" <td>97.2</td>\n",
" <td>1.5</td>\n",
" <td>30.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>125</th>\n",
" <td>87.2</td>\n",
" <td>11.8</td>\n",
" <td>25.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>196</th>\n",
" <td>94.2</td>\n",
" <td>4.9</td>\n",
" <td>8.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>158</th>\n",
" <td>11.7</td>\n",
" <td>36.9</td>\n",
" <td>45.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>180</th>\n",
" <td>156.6</td>\n",
" <td>2.6</td>\n",
" <td>8.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>67.8</td>\n",
" <td>36.6</td>\n",
" <td>114.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>186</th>\n",
" <td>139.5</td>\n",
" <td>2.1</td>\n",
" <td>26.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>144</th>\n",
" <td>96.2</td>\n",
" <td>14.8</td>\n",
" <td>38.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>121</th>\n",
" <td>18.8</td>\n",
" <td>21.7</td>\n",
" <td>50.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>80</th>\n",
" <td>76.4</td>\n",
" <td>26.7</td>\n",
" <td>22.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>69.2</td>\n",
" <td>20.5</td>\n",
" <td>18.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>78</th>\n",
" <td>5.4</td>\n",
" <td>29.9</td>\n",
" <td>9.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>48</th>\n",
" <td>227.2</td>\n",
" <td>15.8</td>\n",
" <td>49.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>180.8</td>\n",
" <td>10.8</td>\n",
" <td>58.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>195.4</td>\n",
" <td>47.7</td>\n",
" <td>52.9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>44.5</td>\n",
" <td>39.3</td>\n",
" <td>45.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>43</th>\n",
" <td>206.9</td>\n",
" <td>8.4</td>\n",
" <td>26.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>102</th>\n",
" <td>280.2</td>\n",
" <td>10.1</td>\n",
" <td>21.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>164</th>\n",
" <td>117.2</td>\n",
" <td>14.7</td>\n",
" <td>5.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>199.8</td>\n",
" <td>2.6</td>\n",
" <td>21.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>155</th>\n",
" <td>4.1</td>\n",
" <td>11.6</td>\n",
" <td>5.7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>36</th>\n",
" <td>266.9</td>\n",
" <td>43.8</td>\n",
" <td>5.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>190</th>\n",
" <td>39.5</td>\n",
" <td>41.1</td>\n",
" <td>5.8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>33</th>\n",
" <td>265.6</td>\n",
" <td>20.0</td>\n",
" <td>0.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>45</th>\n",
" <td>175.1</td>\n",
" <td>22.5</td>\n",
" <td>31.5</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" TV radio newspaper\n",
"37 74.7 49.4 45.7\n",
"109 255.4 26.9 5.5\n",
"31 112.9 17.4 38.6\n",
"89 109.8 47.8 51.4\n",
"66 31.5 24.6 2.2\n",
"119 19.4 16.0 22.3\n",
"54 262.7 28.8 15.9\n",
"74 213.4 24.6 13.1\n",
"145 140.3 1.9 9.0\n",
"142 220.5 33.2 37.9\n",
"148 38.0 40.3 11.9\n",
"112 175.7 15.4 2.4\n",
"174 222.4 3.4 13.1\n",
"55 198.9 49.4 60.0\n",
"141 193.7 35.4 75.6\n",
"149 44.7 25.8 20.6\n",
"25 262.9 3.5 19.5\n",
"34 95.7 1.4 7.4\n",
"170 50.0 11.6 18.4\n",
"39 228.0 37.7 32.0\n",
"172 19.6 20.1 17.0\n",
"153 171.3 39.7 37.7\n",
"175 276.9 48.9 41.8\n",
"61 261.3 42.7 54.7\n",
"65 69.0 9.3 0.9\n",
"50 199.8 3.1 34.6\n",
"42 293.6 27.7 1.8\n",
"129 59.6 12.0 43.1\n",
"179 165.6 10.0 17.6\n",
"2 17.2 45.9 69.3\n",
"12 23.8 35.1 65.9\n",
"133 219.8 33.5 45.1\n",
"90 134.3 4.9 9.3\n",
"22 13.2 15.9 49.6\n",
"41 177.0 33.4 38.7\n",
"32 97.2 1.5 30.0\n",
"125 87.2 11.8 25.9\n",
"196 94.2 4.9 8.1\n",
"158 11.7 36.9 45.2\n",
"180 156.6 2.6 8.3\n",
"16 67.8 36.6 114.0\n",
"186 139.5 2.1 26.6\n",
"144 96.2 14.8 38.9\n",
"121 18.8 21.7 50.4\n",
"80 76.4 26.7 22.3\n",
"18 69.2 20.5 18.3\n",
"78 5.4 29.9 9.4\n",
"48 227.2 15.8 49.9\n",
"4 180.8 10.8 58.4\n",
"15 195.4 47.7 52.9\n",
"1 44.5 39.3 45.1\n",
"43 206.9 8.4 26.4\n",
"102 280.2 10.1 21.4\n",
"164 117.2 14.7 5.4\n",
"9 199.8 2.6 21.2\n",
"155 4.1 11.6 5.7\n",
"36 266.9 43.8 5.0\n",
"190 39.5 41.1 5.8\n",
"33 265.6 20.0 0.3\n",
"45 175.1 22.5 31.5"
]
},
"execution_count": 198,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_test"
]
},
{
"cell_type": "code",
"execution_count": 199,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"37 14.7\n",
"109 19.8\n",
"31 11.9\n",
"89 16.7\n",
"66 9.5\n",
"119 6.6\n",
"54 20.2\n",
"74 17.0\n",
"145 10.3\n",
"142 20.1\n",
"148 10.9\n",
"112 14.1\n",
"174 11.5\n",
"55 23.7\n",
"141 19.2\n",
"149 10.1\n",
"25 12.0\n",
"34 9.5\n",
"170 8.4\n",
"39 21.5\n",
"172 7.6\n",
"153 19.0\n",
"175 27.0\n",
"61 24.2\n",
"65 9.3\n",
"50 11.4\n",
"42 20.7\n",
"129 9.7\n",
"179 12.6\n",
"2 9.3\n",
"12 9.2\n",
"133 19.6\n",
"90 11.2\n",
"22 5.6\n",
"41 17.1\n",
"32 9.6\n",
"125 10.6\n",
"196 9.7\n",
"158 7.3\n",
"180 10.5\n",
"16 12.5\n",
"186 10.3\n",
"144 11.4\n",
"121 7.0\n",
"80 11.8\n",
"18 11.3\n",
"78 5.3\n",
"48 14.8\n",
"4 12.9\n",
"15 22.4\n",
"1 10.4\n",
"43 12.9\n",
"102 14.8\n",
"164 11.9\n",
"9 10.6\n",
"155 3.2\n",
"36 25.4\n",
"190 10.8\n",
"33 17.4\n",
"45 14.9\n",
"Name: sales, dtype: float64"
]
},
"execution_count": 199,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_test"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Creating a Model (Estimator)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Import a model class from a model family"
]
},
{
"cell_type": "code",
"execution_count": 200,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.linear_model import LinearRegression"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Create an instance of the model with parameters"
]
},
{
"cell_type": "code",
"execution_count": 201,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Help on class LinearRegression in module sklearn.linear_model._base:\n",
"\n",
"class LinearRegression(sklearn.base.MultiOutputMixin, sklearn.base.RegressorMixin, LinearModel)\n",
" | LinearRegression(*, fit_intercept=True, normalize=False, copy_X=True, n_jobs=None)\n",
" | \n",
" | Ordinary least squares Linear Regression.\n",
" | \n",
" | LinearRegression fits a linear model with coefficients w = (w1, ..., wp)\n",
" | to minimize the residual sum of squares between the observed targets in\n",
" | the dataset, and the targets predicted by the linear approximation.\n",
" | \n",
" | Parameters\n",
" | ----------\n",
" | fit_intercept : bool, default=True\n",
" | Whether to calculate the intercept for this model. If set\n",
" | to False, no intercept will be used in calculations\n",
" | (i.e. data is expected to be centered).\n",
" | \n",
" | normalize : bool, default=False\n",
" | This parameter is ignored when ``fit_intercept`` is set to False.\n",
" | If True, the regressors X will be normalized before regression by\n",
" | subtracting the mean and dividing by the l2-norm.\n",
" | If you wish to standardize, please use\n",
" | :class:`sklearn.preprocessing.StandardScaler` before calling ``fit`` on\n",
" | an estimator with ``normalize=False``.\n",
" | \n",
" | copy_X : bool, default=True\n",
" | If True, X will be copied; else, it may be overwritten.\n",
" | \n",
" | n_jobs : int, default=None\n",
" | The number of jobs to use for the computation. This will only provide\n",
" | speedup for n_targets > 1 and sufficient large problems.\n",
" | ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.\n",
" | ``-1`` means using all processors. See :term:`Glossary <n_jobs>`\n",
" | for more details.\n",
" | \n",
" | Attributes\n",
" | ----------\n",
" | coef_ : array of shape (n_features, ) or (n_targets, n_features)\n",
" | Estimated coefficients for the linear regression problem.\n",
" | If multiple targets are passed during the fit (y 2D), this\n",
" | is a 2D array of shape (n_targets, n_features), while if only\n",
" | one target is passed, this is a 1D array of length n_features.\n",
" | \n",
" | rank_ : int\n",
" | Rank of matrix `X`. Only available when `X` is dense.\n",
" | \n",
" | singular_ : array of shape (min(X, y),)\n",
" | Singular values of `X`. Only available when `X` is dense.\n",
" | \n",
" | intercept_ : float or array of shape (n_targets,)\n",
" | Independent term in the linear model. Set to 0.0 if\n",
" | `fit_intercept = False`.\n",
" | \n",
" | See Also\n",
" | --------\n",
" | sklearn.linear_model.Ridge : Ridge regression addresses some of the\n",
" | problems of Ordinary Least Squares by imposing a penalty on the\n",
" | size of the coefficients with l2 regularization.\n",
" | sklearn.linear_model.Lasso : The Lasso is a linear model that estimates\n",
" | sparse coefficients with l1 regularization.\n",
" | sklearn.linear_model.ElasticNet : Elastic-Net is a linear regression\n",
" | model trained with both l1 and l2 -norm regularization of the\n",
" | coefficients.\n",
" | \n",
" | Notes\n",
" | -----\n",
" | From the implementation point of view, this is just plain Ordinary\n",
" | Least Squares (scipy.linalg.lstsq) wrapped as a predictor object.\n",
" | \n",
" | Examples\n",
" | --------\n",
" | >>> import numpy as np\n",
" | >>> from sklearn.linear_model import LinearRegression\n",
" | >>> X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])\n",
" | >>> # y = 1 * x_0 + 2 * x_1 + 3\n",
" | >>> y = np.dot(X, np.array([1, 2])) + 3\n",
" | >>> reg = LinearRegression().fit(X, y)\n",
" | >>> reg.score(X, y)\n",
" | 1.0\n",
" | >>> reg.coef_\n",
" | array([1., 2.])\n",
" | >>> reg.intercept_\n",
" | 3.0000...\n",
" | >>> reg.predict(np.array([[3, 5]]))\n",
" | array([16.])\n",
" | \n",
" | Method resolution order:\n",
" | LinearRegression\n",
" | sklearn.base.MultiOutputMixin\n",
" | sklearn.base.RegressorMixin\n",
" | LinearModel\n",
" | sklearn.base.BaseEstimator\n",
" | builtins.object\n",
" | \n",
" | Methods defined here:\n",
" | \n",
" | __init__(self, *, fit_intercept=True, normalize=False, copy_X=True, n_jobs=None)\n",
" | Initialize self. See help(type(self)) for accurate signature.\n",
" | \n",
" | fit(self, X, y, sample_weight=None)\n",
" | Fit linear model.\n",
" | \n",
" | Parameters\n",
" | ----------\n",
" | X : {array-like, sparse matrix} of shape (n_samples, n_features)\n",
" | Training data\n",
" | \n",
" | y : array-like of shape (n_samples,) or (n_samples, n_targets)\n",
" | Target values. Will be cast to X's dtype if necessary\n",
" | \n",
" | sample_weight : array-like of shape (n_samples,), default=None\n",
" | Individual weights for each sample\n",
" | \n",
" | .. versionadded:: 0.17\n",
" | parameter *sample_weight* support to LinearRegression.\n",
" | \n",
" | Returns\n",
" | -------\n",
" | self : returns an instance of self.\n",
" | \n",
" | ----------------------------------------------------------------------\n",
" | Data and other attributes defined here:\n",
" | \n",
" | __abstractmethods__ = frozenset()\n",
" | \n",
" | ----------------------------------------------------------------------\n",
" | Data descriptors inherited from sklearn.base.MultiOutputMixin:\n",
" | \n",
" | __dict__\n",
" | dictionary for instance variables (if defined)\n",
" | \n",
" | __weakref__\n",
" | list of weak references to the object (if defined)\n",
" | \n",
" | ----------------------------------------------------------------------\n",
" | Methods inherited from sklearn.base.RegressorMixin:\n",
" | \n",
" | score(self, X, y, sample_weight=None)\n",
" | Return the coefficient of determination R^2 of the prediction.\n",
" | \n",
" | The coefficient R^2 is defined as (1 - u/v), where u is the residual\n",
" | sum of squares ((y_true - y_pred) ** 2).sum() and v is the total\n",
" | sum of squares ((y_true - y_true.mean()) ** 2).sum().\n",
" | The best possible score is 1.0 and it can be negative (because the\n",
" | model can be arbitrarily worse). A constant model that always\n",
" | predicts the expected value of y, disregarding the input features,\n",
" | would get a R^2 score of 0.0.\n",
" | \n",
" | Parameters\n",
" | ----------\n",
" | X : array-like of shape (n_samples, n_features)\n",
" | Test samples. For some estimators this may be a\n",
" | precomputed kernel matrix or a list of generic objects instead,\n",
" | shape = (n_samples, n_samples_fitted),\n",
" | where n_samples_fitted is the number of\n",
" | samples used in the fitting for the estimator.\n",
" | \n",
" | y : array-like of shape (n_samples,) or (n_samples, n_outputs)\n",
" | True values for X.\n",
" | \n",
" | sample_weight : array-like of shape (n_samples,), default=None\n",
" | Sample weights.\n",
" | \n",
" | Returns\n",
" | -------\n",
" | score : float\n",
" | R^2 of self.predict(X) wrt. y.\n",
" | \n",
" | Notes\n",
" | -----\n",
" | The R2 score used when calling ``score`` on a regressor uses\n",
" | ``multioutput='uniform_average'`` from version 0.23 to keep consistent\n",
" | with default value of :func:`~sklearn.metrics.r2_score`.\n",
" | This influences the ``score`` method of all the multioutput\n",
" | regressors (except for\n",
" | :class:`~sklearn.multioutput.MultiOutputRegressor`).\n",
" | \n",
" | ----------------------------------------------------------------------\n",
" | Methods inherited from LinearModel:\n",
" | \n",
" | predict(self, X)\n",
" | Predict using the linear model.\n",
" | \n",
" | Parameters\n",
" | ----------\n",
" | X : array_like or sparse matrix, shape (n_samples, n_features)\n",
" | Samples.\n",
" | \n",
" | Returns\n",
" | -------\n",
" | C : array, shape (n_samples,)\n",
" | Returns predicted values.\n",
" | \n",
" | ----------------------------------------------------------------------\n",
" | Methods inherited from sklearn.base.BaseEstimator:\n",
" | \n",
" | __getstate__(self)\n",
" | \n",
" | __repr__(self, N_CHAR_MAX=700)\n",
" | Return repr(self).\n",
" | \n",
" | __setstate__(self, state)\n",
" | \n",
" | get_params(self, deep=True)\n",
" | Get parameters for this estimator.\n",
" | \n",
" | Parameters\n",
" | ----------\n",
" | deep : bool, default=True\n",
" | If True, will return the parameters for this estimator and\n",
" | contained subobjects that are estimators.\n",
" | \n",
" | Returns\n",
" | -------\n",
" | params : mapping of string to any\n",
" | Parameter names mapped to their values.\n",
" | \n",
" | set_params(self, **params)\n",
" | Set the parameters of this estimator.\n",
" | \n",
" | The method works on simple estimators as well as on nested objects\n",
" | (such as pipelines). The latter have parameters of the form\n",
" | ``<component>__<parameter>`` so that it's possible to update each\n",
" | component of a nested object.\n",
" | \n",
" | Parameters\n",
" | ----------\n",
" | **params : dict\n",
" | Estimator parameters.\n",
" | \n",
" | Returns\n",
" | -------\n",
" | self : object\n",
" | Estimator instance.\n",
"\n"
]
}
],
"source": [
"help(LinearRegression)"
]
},
{
"cell_type": "code",
"execution_count": 204,
"metadata": {},
"outputs": [],
"source": [
"model = LinearRegression()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Fit/Train the Model on the training data\n",
"\n",
"**Make sure you only fit to the training data, in order to fairly evaluate your model's performance on future data**"
]
},
{
"cell_type": "code",
"execution_count": 205,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"LinearRegression()"
]
},
"execution_count": 205,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model.fit(X_train,y_train)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Understanding and utilizing the Model\n",
"\n",
"-----\n",
"\n",
"## Evaluation on the Test Set"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Metrics\n",
"\n",
"Make sure you've viewed the video on these metrics!\n",
"The three most common evaluation metrics for regression problems:\n",
"\n",
"**Mean Absolute Error** (MAE) is the mean of the absolute value of the errors:\n",
"\n",
"$$\\frac 1n\\sum_{i=1}^n|y_i-\\hat{y}_i|$$\n",
"\n",
"**Mean Squared Error** (MSE) is the mean of the squared errors:\n",
"\n",
"$$\\frac 1n\\sum_{i=1}^n(y_i-\\hat{y}_i)^2$$\n",
"\n",
"**Root Mean Squared Error** (RMSE) is the square root of the mean of the squared errors:\n",
"\n",
"$$\\sqrt{\\frac 1n\\sum_{i=1}^n(y_i-\\hat{y}_i)^2}$$\n",
"\n",
"Comparing these metrics:\n",
"\n",
"- **MAE** is the easiest to understand, because it's the average error.\n",
"- **MSE** is more popular than MAE, because MSE \"punishes\" larger errors, which tends to be useful in the real world.\n",
"- **RMSE** is even more popular than MSE, because RMSE is interpretable in the \"y\" units.\n",
"\n",
"All of these are **loss functions**, because we want to minimize them."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Calculate Performance on Test Set\n",
"\n",
"We want to fairly evaluate our model, so we get performance metrics on the test set (data the model has never seen before)."
]
},
{
"cell_type": "code",
"execution_count": 206,
"metadata": {},
"outputs": [],
"source": [
"# X_test"
]
},
{
"cell_type": "code",
"execution_count": 207,
"metadata": {},
"outputs": [],
"source": [
"# We only pass in test features\n",
"# The model predicts its own y hat\n",
"# We can then compare these results to the true y test label value\n",
"test_predictions = model.predict(X_test)"
]
},
{
"cell_type": "code",
"execution_count": 208,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([15.74131332, 19.61062568, 11.44888935, 17.00819787, 9.17285676,\n",
" 7.01248287, 20.28992463, 17.29953992, 9.77584467, 19.22194224,\n",
" 12.40503154, 13.89234998, 13.72541098, 21.28794031, 18.42456638,\n",
" 9.98198406, 15.55228966, 7.68913693, 7.55614992, 20.40311209,\n",
" 7.79215204, 18.24214098, 24.68631904, 22.82199068, 7.97962085,\n",
" 12.65207264, 21.46925937, 8.05228573, 12.42315981, 12.50719678,\n",
" 10.77757812, 19.24460093, 10.070269 , 6.70779999, 17.31492147,\n",
" 7.76764327, 9.25393336, 8.27834697, 10.58105585, 10.63591128,\n",
" 13.01002595, 9.77192057, 10.21469861, 8.04572042, 11.5671075 ,\n",
" 10.08368001, 8.99806574, 16.25388914, 13.23942315, 20.81493419,\n",
" 12.49727439, 13.96615898, 17.56285075, 11.14537013, 12.56261468,\n",
" 5.50870279, 23.29465134, 12.62409688, 18.77399978, 15.18785675])"
]
},
"execution_count": 208,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"test_predictions"
]
},
{
"cell_type": "code",
"execution_count": 209,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.metrics import mean_absolute_error,mean_squared_error"
]
},
{
"cell_type": "code",
"execution_count": 210,
"metadata": {},
"outputs": [],
"source": [
"MAE = mean_absolute_error(y_test,test_predictions)\n",
"MSE = mean_squared_error(y_test,test_predictions)\n",
"RMSE = np.sqrt(MSE)"
]
},
{
"cell_type": "code",
"execution_count": 211,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1.213745773614481"
]
},
"execution_count": 211,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"MAE"
]
},
{
"cell_type": "code",
"execution_count": 212,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2.2987166978863796"
]
},
"execution_count": 212,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"MSE"
]
},
{
"cell_type": "code",
"execution_count": 213,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1.5161519375993884"
]
},
"execution_count": 213,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"RMSE"
]
},
{
"cell_type": "code",
"execution_count": 214,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"14.0225"
]
},
"execution_count": 214,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['sales'].mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Review our video to understand whether these values are \"good enough\".**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Residuals\n",
"\n",
"Revisiting Anscombe's Quartet: https://en.wikipedia.org/wiki/Anscombe%27s_quartet\n",
"\n",
"<img src=\"https://upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Anscombe%27s_quartet_3.svg/850px-Anscombe%27s_quartet_3.svg.png\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<table class=\"wikitable\">\n",
"<tbody><tr>\n",
"<th>Property\n",
"</th>\n",
"<th>Value\n",
"</th>\n",
"<th>Accuracy\n",
"</th></tr>\n",
"<tr>\n",
"<td><a href=\"/wiki/Mean\" title=\"Mean\">Mean</a> of <i>x</i>\n",
"</td>\n",
"<td>9\n",
"</td>\n",
"<td>exact\n",
"</td></tr>\n",
"<tr>\n",
"<td>Sample <a href=\"/wiki/Variance\" title=\"Variance\">variance</a> of <i>x</i> : <span class=\"mwe-math-element\"><span class=\"mwe-math-mathml-inline mwe-math-mathml-a11y\" style=\"display: none;\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" alttext=\"{\\displaystyle \\sigma ^{2}}\">\n",
" <semantics>\n",
" <mrow class=\"MJX-TeXAtom-ORD\">\n",
" <mstyle displaystyle=\"true\" scriptlevel=\"0\">\n",
" <msup>\n",
" <mi>&#x03C3;<!-- σ --></mi>\n",
" <mrow class=\"MJX-TeXAtom-ORD\">\n",
" <mn>2</mn>\n",
" </mrow>\n",
" </msup>\n",
" </mstyle>\n",
" </mrow>\n",
" <annotation encoding=\"application/x-tex\">{\\displaystyle \\sigma ^{2}}</annotation>\n",
" </semantics>\n",
"</math></span><img src=\"https://wikimedia.org/api/rest_v1/media/math/render/svg/53a5c55e536acf250c1d3e0f754be5692b843ef5\" class=\"mwe-math-fallback-image-inline\" aria-hidden=\"true\" style=\"vertical-align: -0.338ex; width:2.385ex; height:2.676ex;\" alt=\"\\sigma ^{2}\"/></span>\n",
"</td>\n",
"<td>11\n",
"</td>\n",
"<td>exact\n",
"</td></tr>\n",
"<tr>\n",
"<td>Mean of <i>y</i>\n",
"</td>\n",
"<td>7.50\n",
"</td>\n",
"<td>to 2 decimal places\n",
"</td></tr>\n",
"<tr>\n",
"<td>Sample variance of <i>y</i> : <span class=\"mwe-math-element\"><span class=\"mwe-math-mathml-inline mwe-math-mathml-a11y\" style=\"display: none;\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" alttext=\"{\\displaystyle \\sigma ^{2}}\">\n",
" <semantics>\n",
" <mrow class=\"MJX-TeXAtom-ORD\">\n",
" <mstyle displaystyle=\"true\" scriptlevel=\"0\">\n",
" <msup>\n",
" <mi>&#x03C3;<!-- σ --></mi>\n",
" <mrow class=\"MJX-TeXAtom-ORD\">\n",
" <mn>2</mn>\n",
" </mrow>\n",
" </msup>\n",
" </mstyle>\n",
" </mrow>\n",
" <annotation encoding=\"application/x-tex\">{\\displaystyle \\sigma ^{2}}</annotation>\n",
" </semantics>\n",
"</math></span><img src=\"https://wikimedia.org/api/rest_v1/media/math/render/svg/53a5c55e536acf250c1d3e0f754be5692b843ef5\" class=\"mwe-math-fallback-image-inline\" aria-hidden=\"true\" style=\"vertical-align: -0.338ex; width:2.385ex; height:2.676ex;\" alt=\"\\sigma ^{2}\"/></span>\n",
"</td>\n",
"<td>4.125\n",
"</td>\n",
"<td>±0.003\n",
"</td></tr>\n",
"<tr>\n",
"<td><a href=\"/wiki/Correlation\" class=\"mw-redirect\" title=\"Correlation\">Correlation</a> between <i>x</i> and <i>y</i>\n",
"</td>\n",
"<td>0.816\n",
"</td>\n",
"<td>to 3 decimal places\n",
"</td></tr>\n",
"<tr>\n",
"<td><a href=\"/wiki/Linear_regression\" title=\"Linear regression\">Linear regression</a> line\n",
"</td>\n",
"<td><i>y</i>&#160;=&#160;3.00&#160;+&#160;0.500<i>x</i>\n",
"</td>\n",
"<td>to 2 and 3 decimal places, respectively\n",
"</td></tr>\n",
"<tr>\n",
"<td><a href=\"/wiki/Coefficient_of_determination\" title=\"Coefficient of determination\">Coefficient of determination</a> of the linear regression : <span class=\"mwe-math-element\"><span class=\"mwe-math-mathml-inline mwe-math-mathml-a11y\" style=\"display: none;\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" alttext=\"{\\displaystyle R^{2}}\">\n",
" <semantics>\n",
" <mrow class=\"MJX-TeXAtom-ORD\">\n",
" <mstyle displaystyle=\"true\" scriptlevel=\"0\">\n",
" <msup>\n",
" <mi>R</mi>\n",
" <mrow class=\"MJX-TeXAtom-ORD\">\n",
" <mn>2</mn>\n",
" </mrow>\n",
" </msup>\n",
" </mstyle>\n",
" </mrow>\n",
" <annotation encoding=\"application/x-tex\">{\\displaystyle R^{2}}</annotation>\n",
" </semantics>\n",
"</math></span><img src=\"https://wikimedia.org/api/rest_v1/media/math/render/svg/5ce07e278be3e058a6303de8359f8b4a4288264a\" class=\"mwe-math-fallback-image-inline\" aria-hidden=\"true\" style=\"vertical-align: -0.338ex; width:2.818ex; height:2.676ex;\" alt=\"R^{2}\"/></span>\n",
"</td>\n",
"<td>0.67\n",
"</td>\n",
"<td>to 2 decimal places\n",
"</td></tr></tbody></table>"
]
},
{
"cell_type": "code",
"execution_count": 215,
"metadata": {},
"outputs": [],
"source": [
"quartet = pd.read_csv('anscombes_quartet1.csv')"
]
},
{
"cell_type": "code",
"execution_count": 216,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.collections.LineCollection at 0x21603321888>"
]
},
"execution_count": 216,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEGCAYAAABiq/5QAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAgkUlEQVR4nO3deWBU1dnH8e8zEIwEQYwhoIIRbbWiiDQqaLXWrWituFCsWrUtlVoXsLihVVTADRVxQ0Wl4FJtpK7FUq37VisqpSAKFbGCQGJeEIiODMzz/nESBCQsITM3M/f3+SeZmyH3uRp+OZx77nPM3RERkfhIRF2AiIhkl4JfRCRmFPwiIjGj4BcRiRkFv4hIzDSPuoCNse2223pZWVnUZYiI5JR33nnnc3cvWft4TgR/WVkZkydPjroMEZGcYmafrOt4xqZ6zGysmVWa2bTVjv3MzKabWdrMyjN1bhERqV8m5/jHAb3WOjYNOB54JYPnFRGR9cjYVI+7v2JmZWsdmwFgZpk6rYiIbECTXdVjZv3NbLKZTa6qqoq6HBGRvNFkg9/dx7h7ubuXl5R866a0iIg0UE6s6hGReEunnTnVNSxckqS0dSFlxUUkEpoybigFv4g0aem0M2n6AgZVTCGZSlNYkGBk32706tJe4d9AmVzO+TDwJrCrmc01s35mdpyZzQV6AhPN7O+ZOr+I5Ic51TWrQh8gmUozqGIKc6prIq4sd2VyVc9J9Xzp8UydU0Tyz8IlyVWhXyeZSlO5NEnnklYRVZXbmuzNXRERgNLWhRQWrBlVhQUJ2m1VGFFFuU/BLyJNWllxESP7dqPu8Z+6Of6y4qJoC8thurkrIk1aImH06tKertu3YfnKNHec3F2rejaTgl9EmrxEwigsaEZhQTPN6zcCTfWIiMSMgl9EJGYU/CIiMaPgFxGJGQW/iEjMKPhFRGJGwS8iEjMKfhGRmFHwi4jEjIJfRCRmFPwiIjGj4BcRiRkFv4hIzCj4RURiRsEvIhIzCn4RkZhR8IuIxEzGgt/MxppZpZlNW+3YNmb2nJnNqv3YNlPnFxGRdcvkiH8c0GutY4OB5939O8Dzta9FRCSLMhb87v4K8H9rHe4NjK/9fDxwbKbOLyKSy9KpFSwc+yD//GA+s6uWkU57o33vbM/xl7r7/NrPFwCl9b3RzPqb2WQzm1xVVZWd6kREorZiBen7H+DL7+5Gab9TeeDiWzjq1leZNH1Bo4V/ZDd33d2Beq/C3ce4e7m7l5eUlGSxMhGRCKxYAePGwfe+R+L005j35Up+13swz+x2AMlUmkEVU5hTXdMop2reKN9l4y00sw7uPt/MOgCVWT6/iEjTkkrB/ffDNdfA7NnQrRsfjh5Hrznb4PbN2DyZSlO5NEnnklabfcpsj/ifAk6v/fx04Mksn19EpGlYvhzGjIHvfAd+8xto2xaefBLefZeCPiewRYs1x+WFBQnabVXYKKfO5HLOh4E3gV3NbK6Z9QOuAw43s1nAYbWvRUTi4+uvYfRo2GUX+O1vobQUJk6Et9+GY44BM8qKixjZtxuFBSGiCwsSjOzbjbLiokYpIWNTPe5+Uj1fOjRT5xQRabKSSbjnHrj+epg3D3r2DK+POALM1nhrImH06tKe3QYcSOXSJO22KqSsuIhEwur55psm23P8IiLx8uWXYUpnxAiYPx8OPBDGj4dDDvlW4K8ukTA6l7RqlDn9tSn4RUQyoaYG7roLbrgBFi6Egw+GP/0JfvjD9QZ+Nij4RUQa07JlcMcdcOON8PnncNhhUFEBBx0UdWWrKPhFRBrDkiVw++0wciRUV8OPfwxDhsD++0dd2bco+EVENsfixXDbbXDzzbBoERx1VAj8/faLurJ6KfhFRBpi0SK45RYYNQq++CIsxbz8cigvj7qyDVLwi4hsiurqMLq/9VZYuhSOOy4E/t57R13ZRlPwi4hsjKqqMH9/++1hxU6fPnDZZdC1a9SVbTIFv4jI+lRWhhU6o0eHNfknnhgCv0uXqCtrMAW/iMi6LFgAI0bw9e2jKViRInHySSHwd9st6so2m4JfRGR1n30W2iqMGQOpFG/u82OeOPI0Rg05MerKGo2CX0QE4NNPQ+Dfey+sXAmnnQaXXMKdz+ffRlCRbcQiItIkfPIJnHkm7LxzGOWfdhrMnAn33Rc6aOYhjfhFJJ4+/jhsfjJuHCQSoSf+xRfDjjtGXVnGKfhFJF7++98Q+PffD82bh9H+RRdBx45RV5Y1Cn4RiYeZM+Hqq+Ghh6CgAM45JwT+dttFXVnWKfhFJL/NmAHDh8Mjj8AWW8DAgXDhhdC+fdSVRUbBLyL5adq0EPgVFdCyJVxwAZx/PrRrF3VlkVPwi0h+mToVhg2DCROgVSsYPBgGDYJtt426siZDwS8i+eG992DoUHjiCWjdOjROO+882GabqCtrchT8IpLbJk8Ogf/007D11nDllTBgALRtG3VlTVYkD3CZ2UAzm2Zm083svChqEJEc99Zb8JOfwD77wGuvhemdOXPgiisU+huQ9eA3sz2AM4B9gb2Ao80sPx+PE5HG98YbYVvDHj1C+F97bQj8yy6DNm2iri4nRDHV8z3gLXf/EsDMXgaOB0ZEUIuI5IjvzXqPEyb+ET6YDCUlMGIE/O534QaubJIogn8acLWZFQNfAUcBkyOoQ0SaOnd46SW46iqufPllFrfeBm66CX77Wygqirq6nJX14Hf3GWZ2PfAsUANMAVau/T4z6w/0B+jUqVM2SxSRqLnD88+Hm7avvgodOsDNN7N1//5hTb5slkhu7rr7fe7+fXc/CFgEzFzHe8a4e7m7l5eUlGS/SBHJPneYNAkOOAAOPzw0Urv9dpg9OyzNVOg3ikiWc5pZO3evNLNOhPn9HlHUIdJQ6bQzp7qGhUuSlLYupKy4iETCoi4rd7nDM8+EEf6//gWdOsGdd8KvfhXaLEijimod/19q5/hTwNnuvjiiOkQ2WTrtTJq+gEEVU0im0hQWJBjZtxu9urRX+G8q97D+fuhQeOcdKCsLPfFPPx1atIi6urwV1VTPge6+u7vv5e7PR1GDSEPNqa5ZFfoAyVSaQRVTmFNdE3FlOSSdhsceg+7doXdvWLwYxo4NHTTPOEOhn2HagUtkEy1cklwV+nWSqTSVS5MRVZRD0ml49FHo1g1OOAFqamD8ePjggzCtU1AQdYWxoOAX2USlrQspLFjzr05hQYJ2WxVGVFEOWLkytEXec0/o2xdSKXjwQXj//bDVYXN1j8kmBb/IJiorLmJk325Y7XR+3Rx/WbHWlX/LihUh4Lt0gZNOCscefji0TD7lFAV+RPRfXWQTJRJGry7t6bp9G5avTHPHyd21qmdtK1aEna6uvhpmzQoj/UcfheOPD/vbSqQU/CINkEgYhQXNKCxoRucStQxYJZWCBx4IgT97dpjLf+yxcANXgd9kKPhFZPMtXw7jxn3TMO3734cnn4Sf/pRVc2LSZOhXsIg03Ndfhwetdtkl9M9p1w7++ld4+2045hiFfhOlEb+IbLpkEu65B66/HubNg549w+sjjlDY5wAFv4hsvC+/DE/WjhgB8+fDD34QpngOPVSBn0MU/CKyYTU1cNddcMMNsHAhHHxwWLVz8MEK/BykOX4Rqd+yZWE6p6wMLriA/7TtyJXn3wEvvgg/+pFCP0dpxC8i37ZkSWiHPHIkVFeHrQ4vv5zh0zRWzAf6vygi31i8OGxaXlYGf/gD7LcfvPkmTJpEuuf+JFMrWZJMMbtqGem0R12tNJCCX0Rg0SK48soQ+EOGhJu2b78NEydCjx6rWlFPnfcFM+Yv5ahbX2XS9AUK/xyl4BeJs+pquOwy2HFHuOoqOOQQePddeOopKC9f9ba6VtRem/NqRZ3bNMcvEkdVVWH+/vbbww3cPn3CL4C99lrn29fXilotK3KPgl8kThYuhJtugtGjw5r8E08Mc/l77LHeP1bXinr18Fcr6tylqR6ROFiwAAYNgp12CsF/7LEwfXpokbyB0IdvWlHX7UOgVtS5TSN+kXz22WdhHf6YMaFz5imnwKWXwq67btK3qWtFvduAA6lcmqTdVtp
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# y = 3.00 + 0.500x\n",
"quartet['pred_y'] = 3 + 0.5 * quartet['x']\n",
"quartet['residual'] = quartet['y'] - quartet['pred_y']\n",
"\n",
"sns.scatterplot(data=quartet,x='x',y='y')\n",
"sns.lineplot(data=quartet,x='x',y='pred_y',color='red')\n",
"plt.vlines(quartet['x'],quartet['y'],quartet['y']-quartet['residual'])"
]
},
{
"cell_type": "code",
"execution_count": 217,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:xlabel='residual', ylabel='Density'>"
]
},
"execution_count": 217,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAEGCAYAAAB/+QKOAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAxuklEQVR4nO3dd3xUdfb/8ddJT0iBkISSQoCEElApoSgdFcECq2vBioprd92u6659f+rqrq6r7q51XSuWtWBFVLDSQq8JIZQQWkggJIT08/tjhv3GGEgCmdzJzHk+HvPIzC0zb0pycu/n3vMRVcUYY4xpKMDpAMYYY7yTFQhjjDGNsgJhjDGmUVYgjDHGNMoKhDHGmEYFOR2gtcTFxWlqaqrTMYwxpl1ZunTpXlWNb2ydzxSI1NRUsrKynI5hjDHtiohsPdI6O8VkjDGmUVYgjDHGNMoKhDHGmEZZgTDGGNMoKxDGGGMaZQXCGGNMo6xAGGOMaZQVCGPaQEV1LbV11lrftC8+c6OcMd5EVVm8uZhZS/JZmFfEzpIKALrHhDG2TzyXjEjhxKSOzoY0pgkeLRAiMhl4HAgEnlPVhxqsvx64CagFyoBrVXWde93vgZnudT9X1TmezGpMaynYf4i731/D5+v3EBUWxMR+CfSOj6S2TsnZXcqHq3Yya0k+kzK68KdzB5IQFeZ0ZGMaJZ6aUU5EAoEc4HRgO7AEuPhwAXBvE62qB9zPpwI3qupkEckAXgeGA92Bz4E+qlp7pM/LzMxUa7VhnLYwr4jrX1lKZXUdvzw9nctHphIeEviDbUorqnnxuy08OS+XDqFB/H36YEanxzmU2Pg7EVmqqpmNrfPkGMRwIFdV81S1CpgFTKu/weHi4NYBOFytpgGzVLVSVTcDue73M8ZrzV23m8ufX0TnDiF8cusYrh3b+0fFASAqLJhbTk3no5+PJj4ylCv/vZj3VxQ4kNiYo/NkgUgE8uu93u5e9gMicpOIbAIeBn7ewn2vFZEsEckqLCxsteDGtNR3uXu56dVlZHSP4Z0bR5Ea16HJfdISonjz+pPJTO3EL95YwYerdrRBUmOaz/GrmFT1KVXtDdwG/LGF+z6jqpmqmhkf32i3WmM8bvPeg1z/ylJ6xXfgP1cNIyY8uNn7xoQH8+JVwxnWI5ZfvrGCbzfu9WBSY1rGkwWiAEiu9zrJvexIZgE/OcZ9jXFEeVUN17+8lKAA4bkZmXSMCGnxe4QFB/LsjEx6x0dy02vL2FZU7oGkxrScJwvEEiBdRHqKSAgwHZhdfwMRSa/38ixgo/v5bGC6iISKSE8gHVjswazGHJP/99F6cvaU8veLB5PUKeKY3ycmPJhnLneNE177chYV1Ue8HsOYNuOxAqGqNcDNwBxgPfCmqq4VkfvcVywB3Cwia0VkBfArYIZ737XAm8A64FPgpqNdwWSME77KKeTVRdv42ZhejEk//lOcKZ0jeHz6IDbsKuXBj9e3QkJjjo/HLnNta3aZq2lL5VU1nPbXr4gIDeLDW0YTFvzjq5WO1f0fruP5bzfz7yuHMaFfQqu9rzGNceoyV2N81pNf5rKjpIKHzjuhVYsDwG/P6EvfLlHc8e5qyiprWvW9jWkJKxDGtNCmwjKe/SaP84cmkZka2+rvHxYcyAPnncCuAxU8Njen1d/fmOayAmFMC/1lTjahQYHcPqWfxz5jaI9OXDw8hX9/t5k1BSUe+xxjjsYKhDEtsGr7fj5Zs4trxvQkLjLUo5912xn9iO0Qyh3vrrZOsMYRViCMaYG/fJZDp4hgZo7u6fHPiokI5s6z+7NqewmvLNzq8c8zpiErEMY008K8Ir7OKeSG8b2JCmv+3dLHY+pJ3RmV1pnHPs+hpLy6TT7TmMOsQBjTDKrKX+Zk0yU6lCtOTm2zzxUR/nBmBiWHqnlqfm6bfa4xYAXCmGaZn1NI1tZ93DIxvdUva21KRvdofjokiRe/20J+sbXhMG3HCoQxzfDP+ZvoHhPGhZnJTW/sAb+e1IeAAPjLZ9mOfL7xT1YgjGnCivz9LN5czNWjexIS5My3TLeYcGaO7sn7K3awavt+RzIY/2MFwpgmPPt1HlFhQUwfnuJojuvH9aZzhxAe/HiDozmM/7ACYcxRbCsq55M1O7l0RA8iQz06hXuTosKCuXFCGgvyiliwqcjRLMY/WIEw5iie/zaPwADhqlGpTkcB4NIRKSREhfLY5zn4SqNN472sQBhzBPsOVvFm1namDUqkS3SY03EAV5+mmyaksXhzMd/bUYTxMCsQxhzBa4u3cai6lp+N6eV0lB+4aFgy3WLCeHSuHUUYz7ICYUwjamrreHXhVkanxdG3a5TTcX7g8FHE0q37+MbmsDYeZAXCmEZ8uWEPO0oquGxkD6ejNOrCzGQSO4bbUYTxKCsQxjTi5YVb6RYTxmn9vXNGt5CgAG6emMaK/P3Mzy50Oo7xUVYgjGkgr7CMbzbu5ZLhKQQFeu+3yPlDk0iODbcrmozHeO//fmMc8uqibQQHChcNd6atRnMFBwZwy8R0Vm0v4Yv1e5yOY3yQFQhj6jlUVctbWflMHtiNhCjvuLT1aM4bnEhKbASPf7HRjiJMq7MCYUw9s1cWcKCihsu9dHC6oaDAAG6ekMbqghK+3GBHEaZ1WYEwpp7XF+eTnhDJsNROTkdptnOHJJIcG25HEabVWYEwxi17Vykr8vdz0bBkRMTpOM0W7D6KWLW9xK5oMq3KCoQxbm8sySc4UDhvSJLTUVrsvCFJJHUK5292FGFakUcLhIhMFpFsEckVkdsbWf8rEVknIqtE5AsR6VFvXa2IrHA/ZnsypzGVNbW8s3w7kzK6EtshxOk4LRYcGMBNE9JYmb+fr3LsKMK0Do8VCBEJBJ4CpgAZwMUiktFgs+VApqqeCLwNPFxv3SFVHeR+TPVUTmMA5q7bzf7yai4a5t2Xth7NT4ckkdjRxiJM6/HkEcRwIFdV81S1CpgFTKu/garOU9XDk+wuBNrfsb3xCW8sySexYzij0+KcjnLMQoICuHFCb5Zv28/X1qPJtAJPFohEIL/e6+3uZUcyE/ik3uswEckSkYUi8pPGdhCRa93bZBUW2mG1OTbb95Xzbe5ezh+aREBA+xmcbswFQ5PpHhPG43Z3tWkFXjFILSKXAZnAI/UW91DVTOAS4G8i0rvhfqr6jKpmqmpmfHx8G6U1vuatrO0AXJDZ/g9gXUcRaSzbtp9vc+0owhwfTxaIAqD+Cd0k97IfEJHTgD8AU1W18vByVS1wf80D5gODPZjV+KnaOuWtrHxGp8WR1CnC6Tit4oLMJLrFhPH45zYWYY6PJwvEEiBdRHqKSAgwHfjB1UgiMhh4Gldx2FNveScRCXU/jwNGAes8mNX4qW9z97KjpILpw1KcjtJqQoMCuXF8b7K27rNZ58xx8ViBUNUa4GZgDrAeeFNV14rIfSJy+KqkR4BI4K0Gl7P2B7JEZCUwD3hIVa1AmFb3xpJtdIoI5rQM72zrfawuHJZM12g7ijDHJ8iTb66qHwMfN1h2V73npx1hv++BEzyZzZiiskrmrtvN5SNTCQ0KdDpOqwoNCuSG8b25e/ZaFmwq4pR2fHWWcY5XDFIb44R3lxdQXavt+t6Ho7loWDJdokP52xcbnY5i2ikrEMYvqSpvLMlnUHJHr5tzurWEBQdyw7jeLN5czAIbizDHwAqE8UvL8/ezcU8Z03306OGw6cNTSIgK5fEvcpyOYtohKxDGL72xOJ+IkEDOPqm701E8Kiw4kOvH9WZhXjGL8uwowrSMFQjjd8oqa/hg1Q7OPrEbkaEevU7DK1wyIoX4qFAet7EI00JWIIzf+WjVDsqran12cLqhsOBArhvbi+83FbF4c7HTcUw7YgXC+J03luTTO74DQ1Laz6xxx+vSET2Ii7SxCNMyViCMX9m4u5Rl2/YzfVhKu5o17niFhwRy/bhefJdbxPebrEeTaR4rEMavvL7YNWvcuUOO1ljYN102sgfdYsL486fZdne1aRYrEMZvVFT/36xxcZGhTsdpc2HBgfzytD6szN/PnLW7nI5j2gErEMZvzFm7i/3l1Uwf7h+
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.kdeplot(quartet['residual'])"
]
},
{
"cell_type": "code",
"execution_count": 218,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.lines.Line2D at 0x216032d8a08>"
]
},
"execution_count": 218,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYoAAAEJCAYAAACKWmBmAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAZzklEQVR4nO3dfXRddZ3v8ffnlGAkbUXTtEVKDZ3pwFAHC+YiqLhwwLF0kPpEhXV1wOtMvI7cytS5XhAfZumM4rpOlxdxwIpc8GGQXpShSi1P4sAslUuKUSgVqJ0gLbQJQWgJnEvK+d4/zk4mTU92Th7O2efkfF5rZWU//LL3d52V9NP9+/323ooIzMzMxpLLugAzM6ttDgozM0vloDAzs1QOCjMzS+WgMDOzVA4KMzNLlVlQSDpK0l2SHpK0VdLHSrSRpMslbZf0a0knZlGrmVkjOyTDc+8HPh4R90uaA2yRdHtEPDSizZnA0uTrDcCVyXczM6uSzIIiIp4EnkyW90naBhwJjAyKVcC3onhX4C8kHS7piORnxzRv3rxob2+vUOVmZjPPli1bnoqItlL7sryiGCapHTgBuHfUriOBx0es70y2pQZFe3s7XV1d01mimdmMJumxsfZlPpgtaTbwfeCiiNg7heN0SuqS1NXX1zd9BZqZNbhMg0JSE8WQ+G5E/KBEk13AUSPWFyXbDhIR6yOiIyI62tpKXj2ZmdkkZDnrScA3gW0RsW6MZhuBv0hmP50MPDve+ISZmU2vLMco3gR8AHhAUney7ZPAYoCIuArYBKwEtgPPAx+sfplmZo0ty1lP/wZonDYBfLQ6FZmZWSk1MevJzMwmr1AIevoH2LM3z4K5zbS3tpDLpf4/fEIcFGZmdaxQCDZv3c3aDd3kBws0N+VYt3o5K5YtnLawyHx6rJmZTV5P/8BwSADkBwus3dBNT//AtJ3DQWFmVsf27M0Ph8SQ/GCB3n35aTuHg8LMrI4tmNtMc9OB/5Q3N+WYP6d52s7hoDAzq2PtrS2sW718OCyGxijaW1um7RwezDYzq2O5nFixbCHHrjmV3n155s/xrCczMxsllxNL2mazpG12ZY5fkaOamdmM4aAwM7NUDgozM0vloDAzs1QOCjMzS+WgMDOzVA4KMzNL5aAwM7NUDgozM0uVaVBIukZSr6QHx9h/mqRnJXUnX5+pdo1mZo0u60d4XAtcAXwrpc09EXFWdcoxM7PRMr2iiIi7gaezrMHMzNLVwxjFKZJ+JenHkpZlXYyZWaPJuutpPPcDr4mI5yStBP4FWFqqoaROoBNg8eLFVSvQzGymq+kriojYGxHPJcubgCZJ88Zouz4iOiKio62trap1mpnNZDUdFJIWSlKyfBLFevuzrcrMrLFk2vUk6XrgNGCepJ3AZ4EmgIi4Cngv8BFJ+4EXgHMjIjIq18ysIWUaFBFx3jj7r6A4fdbMzDJS011PZmaWPQeFmZmlqvXpsWZWRwqFoKd/gD178yyY20x7awu5nLIuy6bIQWFm06JQCDZv3c3aDd3kBws0N+VYt3o5K5YtdFjUOXc9mdm06OkfGA4JgPxggbUbuunpH8i4MpsqB4WZTYs9e/PDITEkP1igd18+o4psurjryWwC3Ac/tgVzm2luyh0QFs1NOebPac6wKpsOvqIwK9NQH/zKy+/hvG/cy8rL72Hz1t0UCr4HFKC9tYV1q5fT3FT8Z2VojKK9tSXjymyqNBNvdO7o6Iiurq6sy7AZZkffc6y8/J6D/se8ac2pLGmbnWFltWPoiqt3X575c3zFVU8kbYmIjlL73PVkVqa0PngHRVEuJ5a0zfbnMcO468msTEN98CO5D94agYPCrEzug7dG5a4nszLlcmLFsoUcu+ZU98FbQ3FQmE2A++CtEbnryczMUjkozMwslYPCzMxSOSjMzCxVpkEh6RpJvZIeHGO/JF0uabukX0s6sdo1mpk1uqyvKK4FVqTsPxNYmnx1AldWoSYzMxsh06CIiLuBp1OarAK+FUW/AA6XdER1qjMzM8j+imI8RwKPj1jfmWw7iKROSV2Suvr6+qpSnJlZI6j1oChbRKyPiI6I6Ghra8u6HDOzGaPWg2IXcNSI9UXJNjMzq5JaD4qNwF8ks59OBp6NiCezLsrMrJFk+qwnSdcDpwHzJO0EPgs0AUTEVcAmYCWwHXge+GA2lZqZNa5MgyIizhtnfwAfrVI5ZmZWQq13PZmZWcYcFGZmlspBYWZmqRwUZmaWykFhZmapHBRmZpbKQWFmZqkcFGZmlspBYWZmqRwUZmaWykFhZmapHBRmZpbKQWFmZqkcFGZmlspBYWZmqRwUZmaWykFhZmapMg0KSSskPSxpu6SLS+y/QFKfpO7k6y+zqNPMrJFl9ipUSbOArwFvA3YC90naGBEPjWp6Q0RcWPUCzcwMyPaK4iRge0TsiIgXge8BqzKsx8zMSsgyKI4EHh+xvjPZNtp7JP1a0o2SjqpOaWZmNqTWB7N/CLRHxPHA7cB1YzWU1CmpS1JXX19f1Qo0M5vpsgyKXcDIK4RFybZhEdEfEf8vWb0aeP1YB4uI9RHREREdbW1t016smVmjyjIo7gOWSjpa0qHAucDGkQ0kHTFi9WxgWxXrMzMzMpz1FBH7JV0I3ArMAq6JiK2SPgd0RcRGYI2ks4H9wNPABVnVa2bWqBQRWdcw7To6OqKrqyvrMszM6oakLRHRUWpfrQ9mm5lZxhwUZmaWykFhZmapHBRmZpbKQWFmZqkcFGZmlir1PgpJ+4BS82cFRETMrUhVZmZWM1KDIiLmVKsQMzOrTRO6M1vSfKB5aD0ifjftFZmZWU0pa4xC0tmSHgX+HfhXoAf4cQXrsgwUCsGOvuf4+W+fYkffcxQKM++ufTObuHKvKD4PnAzcEREnSHor8P7KlWXVVigEm7fuZu2GbvKDBZqbcqxbvZwVyxaSyynr8swsQ+XOehqMiH4gJykXEXcBJZ8JYvWpp39gOCQA8oMF1m7opqd/IOPKzCxr5V5RPCNpNnA38F1JvYD/BZlB9uzND4fEkPxggd59eZa0zc6oKjOrBeVeUawCXgD+BtgM/BZ4R6WKsupbMLeZ5qYDfx2am3LMn9M8xk+YWaMoKygiYiAiXoqI/RFxXURcnnRF2QzR3trCutXLh8NiaIyivbUl48rMLGtldT2NuvHuUKAJGPANdzNHLidWLFvIsWtOpXdfnvlzmmlvbfFAtpmVFxQjb7yTJIpdUSdXqijLRi4nlrTN9phEnSkUgp7+AfbszbNgrgPept+En/UURf8CvH2qJ5e0QtLDkrZLurjE/pdJuiHZf6+k9qme02wmGZrWvPLyezjvG/ey8vJ72Lx1t++BsWlVbtfTu0es5ihOjc1P5cSSZgFfA94G7ATuk7QxIh4a0exDwO8j4g8lnQt8CXjfVM5rNpOMNa352DWn+srQpk2502NHznDaT/HO7FVTPPdJwPaI2AEg6XvJMUcGxSrg75LlG4ErJClm4ou+zSbB05qtGsodo/hgBc59JPD4iPWdwBvGahMR+yU9C7QCT6Ue+eGH4bTTDty2ejX89V/D88/DypUH/8wFFxS/nnoK3vveg/d/5CPwvvfB44/DBz5w8P6Pfxze8Y7iuT/84YP3f+pTcMYZ0N0NF1108P4vfAHe+Eb42c/gk588eP9XvgLLl8Mdd8Df//3B+7/+dTjmGPjhD+Ef//Hg/d/+Nhx1FNxwA1x55cH7b7wR5s2Da68tfo22aRMcdhj80z/Bhg0H7//pT4vfv/xl+NGPDtz38pfDj5Mnvnz+83DnnQfub22F73+/uHzJJfDznx+4f9Ei+M53issXXVT8DEf6oz+C9euLy52d8MgjB+5fvrz4+QG8//2wc+eB+085Bb74xeLye94D/aMm9J1+Onz608XlM8+EF144cP9ZZ8Hf/m1xefTvHVT0d2/54Eucueh0fnz0f2JJ/06+cOsV5CT+5J5XQNOsYiP/7hWXJ/m7V/jWt+npH+Dln/g4hz/yEM1NsxgeAZpJv3spxnvM+Fcp/ZhxACJiTerRq0hSJ9AJcPzLXpZxNVbrAsgPvsSL+ws
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(data=quartet,x='y',y='residual')\n",
"plt.axhline(y=0, color='r', linestyle='--')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "code",
"execution_count": 219,
"metadata": {},
"outputs": [],
"source": [
"quartet = pd.read_csv('anscombes_quartet2.csv')"
]
},
{
"cell_type": "code",
"execution_count": 220,
"metadata": {},
"outputs": [],
"source": [
"quartet.columns = ['x','y']"
]
},
{
"cell_type": "code",
"execution_count": 221,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.collections.LineCollection at 0x21603475dc8>"
]
},
"execution_count": 221,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEGCAYAAABiq/5QAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAbOklEQVR4nO3de3RV9Zn/8fdzIPbIzUsMaLUYUasIU9GmDnZqR0Arv45LrJ2io3bsTVwdBStWtNOxzm9qC96o11oB74qzAEWdrhkERatdv4oFRWug1REhFbkEWiVED8ac5/fHPoEQEglwzv6es/fntRYryckx+9mKn7PzPc9+vubuiIhIemRCFyAiIvFS8IuIpIyCX0QkZRT8IiIpo+AXEUmZnqEL6I4DDjjAa2trQ5chIlJRlixZssHdazo+XhHBX1tby+LFi0OXISJSUcxsVWePa6lHRCRlFPwiIimj4BcRSRkFv4hIyij4RURSpmTBb2b3mNl6M3u93WP7m9kCM3uz8HG/Uh1fREQ6V8or/vuA0R0euwp4xt2PBJ4pfC0iIjEqWfC7+/PAXzo8PAa4v/D5/cCZpTq+iEhFq6+Hyy6Djz8u+o+Oe41/gLuvKXy+FhjQ1RPNbJyZLTazxY2NjfFUJyIS2osvwpgxMHQouV/+Cl59teiHCPbmrkc7wHS5C4y7T3P3Onevq6nZ4Y5jEZHkcIennoKTT4YTT4Tf/pbZp3+Xi38+Fz7/+aIfLu7gX2dmBwEUPq6P+fgiIuWjtRVmzYrCffRoeOst+MUvYNUq5pz+XTb32ackh407+J8ELih8fgHwRMzHFxEJb8sWmD4djj4azj4bPvgA7rknCv4f/AD69Cnp4Us2pM3MHgFOBg4ws3eAa4ApwCwz+y6wChhbquOLiJSdpia46y6YOhXWrIG6Onj00WhNv0eP2MooWfC7+z918a1RpTqmiEhZamyEW26BO+6A996DUaPgwQdh5Egwi72cihjLLCJSkVatghtvhLvvhlwOzjoLrrwSvvCFoGUp+EVEiq2+Hq67DmbOhEwGvvlNuOKKaE2/DCj4RUSK5cUXYfJkePJJ6NULJkyIbsL6zGdCV7YdBb+IyJ5wh/nzo8D/zW9g//3h3/8dLrkEqqtDV9cpBb+IyO5obY06cqZMgVdegUMOiXrwv/e9krdj7ikFv4jIrtiyBR54AK6/Hv73f+Goo6Ie/PPOg732Cl1dtyj4RUS6o0x68ItBwS8i8kkaG+HWW+H228uiB78YFPwiIp1ZtQpuuglmzCirHvxiUPCLiLRXXx+t38+cCWYs/MJp/NdXzuUX13Q1jKDyKPhFRCDqwZ8yBZ54IurBHz8eLruMu/77ndCVFZ2CX0TSq60Hf8oUeO65LnrwFfwiklL5vLNyYzPrNuUY0C9LbXVvMpnKfHOzknvwi0HBLyI7lc878+rXMnHWUnItebJVGaaOHcboIQdWVvgnoAe/GIJtvSgilWPlxuatoQ+Qa8kzcdZSVm5sDlxZNzU1RVMyDzsMxo2DffeNrvjr6+Hb305V6IOu+EWkG9Ztym0N/Ta5ljzrm3IMqinjpZHOevAfeCD6WKE9+MWg4BepMCHW2gf0y5KtymwX/tmqDP37Zkt63Da7fM4de/C/9jW46qpE9OAXg4JfpIKEWmuvre7N1LHDuHjmy7iz9bi11b1Ldsw2u3TO7XvwIZqDP2lS2czBLxda4xepIKHW2jMZY/SQA/ncwfsw+KC+/PeEk2J7Y7db5/zii3DmmTB0KMyZE7VjrlgRvXGr0N+Bgl+kgnzSWnupZTJGtqoH/bJVDKrpE1s3T5fnvOlDeOopGDECTjwRXngBrrkGGhqi1swy2/yknARZ6jGzS4ELAQOmu/vNIeoQqTSh19pD6HjOmXwrZ7z1O4772o/hD6/CwQdHEzMvvDAVPfjFEPsVv5kNJQr9E4BjgdPN7Ii46xCpRG1r7W0NKXGutYfSds6fam3hnKXzWDjj+9z82BT22vJhtIn5ihXR9oYK/W4LccU/GFjk7h8AmNlvgLOA6wPUIlJR2q+1f9Sa545zj6/sO2i7IdO8mdHzHmL4tOvYf9NGthx7HPm7biFz1tcqbg5+uQixxv86cJKZVZtZL+CrwA6LcWY2zswWm9nixsbG2IsUKVeh1tpj19gIV18NAweSmTSJ/euOhQUL+NQrS8h84x8V+nsg9it+d19uZtcB84FmYCnQ2snzpgHTAOrq6jzOGkUkIPXgl1yQN3fd/W7gbgAz+zlJHH8nIrtGPfixCdXV09/d15vZQKL1/eEh6hDZXYmaVBlaxzn4l1wCEyeqHbOEQt25+6iZVQMtwMXu/l6gOkR2WWImVYbkDgsWwOTJ2+bgX3NNtPnJ1jn44eXzTq6llY9a86xo3JyYF/ggN3C5+0nufoy7H+vuz4SoQWR3VfykypBaW2H2bKirg9NOgzffjHrwV62KNkAps9CfV7+W11a/z/I1TXz11heYV7+WfL7y33LUnbsiuyjk3bMVa8uW6M3awYNh7FjYvLnse/DbXuC9kPNJeoFX8IvsorY7SdtL+t2zu639HPwLL4R+/aJZOsuWwXe+U9Zz8JP8Aq/gF9lFabx7dpe168HniivgmGOiNf3f/x6+/vWK6MFP8gu8gl9kF4WcVFn2GhpgwgQ49FD42c9g5EhYtAiefhpOOaWiNj9pe4FvC/8kvcBrHr/Ibmi7ezZb1aO8d6CKy7JlcN112/fgX3FFtKZfodpe4I+ecBLrm3L075uctl0Fv4jsvkWLopbMhPbgZzLGoJo+iXtxV/CLyK5xh/nzt/Xg77df1IN/ySVwwAGhq5NuUPCLSPe0tjJ8yULGPPUgNPxJc/ArmIJfRD7Zli3w4INw/fVc9uab8NnPRj34550Hn/pU6OpkNyj4RaRzTU1w113RVf2aNfD5z0c9+GeeWRHtmNI1Bb+IbK+xEW69FW6/Hd57D0aNggceiD5WUDumdE3BLyKRhoboLtv2c/CvvBJOOCF0ZVJkCn6RtEtgD758MgW/SFolvAdfuqbgF0mTjnPw1YOfSgp+kTRobYXHHot2unr5ZfXgp5yCXyqatkDciXY9+KgHXwoU/FKxtAXiJ2hqgmnToqv6d99VD36FKfWWjxrLLBVLWyB2orERfvKTaCzyD38YdeZU2Bz8tItjy8cgwW9ml5lZvZm9bmaPmFnl72wgsUvyDkm7rKEBLr00Cvxrr4URIyp2Dn7axbHlY+zBb2YHAxOAOncfCvQAzom7Dql8Sd4hqduWLYNvfQsOPxx++Us45xyor4dHH9WNVxUqjguaUEs9PYG9zawn0At4N1AdUsHSvAXiEW/Xc/mdV8GQITB7dtSOuWIF3HOPbryqcHFc0MT+5q67rzazG4EG4ENgvrvPj7sOqXztt0D8qDXPHecen+yunrYe/ClT+Nmzz7K5V1/14CdQ2wVNx6aFYl7QxB78ZrYfMAY4DHgPmG1m57v7Qx2eNw4YBzBw4MC4y5QKkYotEDv24H/603DTTfQZN049+AkUx5aPIZZ6TgHedvdGd28BHgO+2PFJ7j7N3evcva6mpib2IkWC27IlGpg2eDCMHQubN0dfr1gRjVZQ6CdW25aPwwcdwKCaPkX/LTZEH38DMNzMehEt9YwCFgeoQ6Q8ddaDP3t2NC1T7ZhSBCHW+BeZ2RzgZeBj4BVgWtx1iJSdxka47bZoDv5f/wojR8L992sOvhRdkDt33f0a4JoQxxYpOw0NcNNNMH06fPhhdGV/1VVqx5SS0cgGkVCWLYtm6Dz8cPT1+efDpElqx5SSU/CLxG3RoqhD5/HHozn4F18Ml1+uOfgSGwW/SBza9eDz7LPRHPyf/ATGj1cPvsROwS9SSl304KMefAlIwS9SCp3NwZ8xI1rH1xx8CUzBL1JM6sGXCqDgFykG9eBLBVHwi+yB6r+sjebgqwdfKoiCX2R3LFvG9++7li+99BRkTD34UlG09aLIrli0KLqqHzKEk197jp7jC3Pw771XoS8VQ1f8IjujHnxJGAW/SFfUgy8JpeAX6Ug9+JJwCn7ZY/m8s3JjM+s25RjQr/i7BcVGPfi
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# y = 3.00 + 0.500x\n",
"quartet['pred_y'] = 3 + 0.5 * quartet['x']\n",
"quartet['residual'] = quartet['y'] - quartet['pred_y']\n",
"\n",
"sns.scatterplot(data=quartet,x='x',y='y')\n",
"sns.lineplot(data=quartet,x='x',y='pred_y',color='red')\n",
"plt.vlines(quartet['x'],quartet['y'],quartet['y']-quartet['residual'])"
]
},
{
"cell_type": "code",
"execution_count": 222,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:xlabel='residual', ylabel='Density'>"
]
},
"execution_count": 222,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAEGCAYAAAB/+QKOAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAxSUlEQVR4nO3dd3yV9fn/8deVDRlAyAkjBMJI2DsyVMCBiAvUulBb96bW+q2tq2rVVlvbOqp+q1/rqgNxoIgoirIcCGETVsJOCJAwkhCyc/3+OIf+YnogCeTkPufkej4eeeSce5zzVnJy5b4/S1QVY4wxpq4QpwMYY4zxT1YgjDHGeGUFwhhjjFdWIIwxxnhlBcIYY4xXYU4HaCoJCQmakpLidAxjjAkoy5YtK1BVl7d9QVMgUlJSyMjIcDqGMcYEFBHZfrR9dovJGGOMV1YgjDHGeGUFwhhjjFdWIIwxxnhlBcIYY4xXViCMMcZ4ZQXCGGOMV1YgjDGUVVZTWFpJVXWN01GMHwmagXLGmIarqVF+2LKPD5fnsGTrfnIOlAIQERpCz8QYJvTrwKXpXejSrrXDSY2TrEAY08J8n13Anz5fz9rcIuKiwhiT6uLy9GRaRYRScKiClTsP8Nw3Wfzv/M1cc3I3fn1WGq0j7FdFS+TTf3URmQg8C4QCr6jqk3X23wrcAVQDh4CbVXWdZ999wA2efXeq6hxfZjUm2B2uqOLxz9bzzo87SGrbiqcuGcQFgzsTFR76X8fmHizlma828cq3W5m/MZ8XrxpGaodYB1IbJ4mvlhwVkVBgE3AWkAMsBaYcKQCeY+JUtcjzeBJwu6pOFJF+wLvACKAzMBdIU9Xqo71fenq62lxMxniXe7CUG9/IYMPuIm4e04Nfn5XmtTDU9W1WAXe9t4KKqhreuH4EQ7u2a4a0pjmJyDJVTfe2z5eN1COAbFXdoqoVwDRgcu0DjhQHj2jgSLWaDExT1XJV3Qpke17PGNNIG3YXMfn578g5cJjXrxvBfef2bVBxADg1NYGP7ziFdtERXP3Kj6zYccDHaY0/8WWBSAJ21nqe49n2EyJyh4hsBv4C3NnIc28WkQwRycjPz2+y4MYEi7W5hVzx8mLCQoSPbjuZcWleZ3U+pi7tWjP9ltHEx0Rw05sZ7Nx/2AdJjT9yvJurqr6gqj2B3wEPNvLcl1U1XVXTXa7G/+AbE8x27DvMta8tIToijOm3jD6hNoQOcVG8du1JVFTVcNObGZRVHvVurwkiviwQuUByreddPNuOZhpw4XGea4yp5UBJBde+toTKauWN60fQtf2Jd1ftlRjLc1OGsmF3MX/8bH0TpDT+zpcFYimQKiLdRSQCuAKYWfsAEUmt9fQ8IMvzeCZwhYhEikh3IBVY4sOsxgSNsspqbnwzg5yDpbxyTTq9EmOa7LVP653ITWO68+/F2/l6/Z4me13jn3zWzVVVq0RkKjAHdzfXV1U1U0QeBTJUdSYwVUTGA5XAAeAaz7mZIjIdWAdUAXccqweTMcZNVbn3w9Us33GA56cM46SU+CZ/j3vO7sPCTQU8+PFaRvZoT0ykjZEIVj7r5trcrJurMTBtyQ7u/WgNd5+Vxp1nptZ/wnFatv0Al/zze64ZncIjk/r77H2M7znVzdUY04zW5xXx8MxMxqQmcMfpvXz6XsO7teOqkV1584dtbNpT7NP3Ms6xAmFMEDhUXsUdby+nTatwnr58CKEh4vP3vPus3kRHhvGn2dZgHaysQBgTBB76ZC3b9pXw3JShJMRENst7xkdHcOcZqczfmM/CTTYOKRhZgTAmwH2ZuZuPlucy9fRejOrRvlnf+xcndyOpbSv+9tUmgqU90/x/ViCMCWD7Syq4f8Ya+nWKY+oZvmuUPprIsFCmntGLVTsPMn+jXUUEGysQxgSw33+ylsLSSv5++WAiwpz5OF8yvAtd2rXi6bl2FRFsrEAYE6A+W53HZ6vzuGt8Gn06xjmWIzw0hDtO78XqnEK+37zPsRym6VmBMCYAHTxcwUOfrGVQlzbcMraH03G4aGgSCTGRvLRwi9NRTBOyAmFMAHpi9gYOllby558NIizU+Y9xVHgo152SwsJN+azbVVT/CSYgOP+TZYxplMVb9vFexk5uHNOdvp2cu7VU19Uju9E6IpTXvtvqdBTTRKxAGBNAyququX/GGpLjW3HXmWlOx/mJNq3DmTwkiU9X76LwcKXTcUwTsAJhTAD53/mb2ZJfwuMXDqRVRMNWhWtOV43sSlllDR+tyHE6imkCViCMCRDb95Xw4vzNXDC483GtDNccBiS1YUhyW97+cYd1eQ0CViCMCRCPzVpHeIjw4Hl9nY5yTFeN7Er23kP8uHW/01HMCbICYUwA+GbDHuau38udZ6bSIS7K6TjHdMHgzsRFhfHW4u1ORzEnyAqEMX6urLKaP3y6jp6uaK47pbvTceoVFR7KJcOTmZO5m/zicqfjmBNgBcIYP/fKoi1s33eYRyb1d2w6jca6alRXKquV95ftdDqKOQGB8dNmTAuVe7CU5+dlc86AjoxJ9c+GaW96umI4KaUdHy3PtcbqAGYFwhg/9sfP1gHw4Pn9HE7SeBcOTSJ77yEybWR1wLICYYyf+jargNlrdjP19F4ktW3ldJxGO29gJ8JDhU9W5jodxRwnKxDG+KGKqhoenrmWbu1bc+MY5yfjOx5tW0dwWu9EPlm5i+oau80UiKxAGOOHXv9+K5vzS3j4gn5EhfvfiOmGumhoEnuLy/nBpgEPSFYgjPEze4rKeHZuFmf2SeSMPh2cjnNCzuiTSGxkGDNW2G2mQGQFwhg/86fZ66msUR66IPAapuuKCg/lnIEd+WJtHqUV1U7HMY3k0wIhIhNFZKOIZIvIvV723y0i60RktYh8LSLdau2rFpGVnq+ZvsxpjL/4ccs+Plm5i1vH9qBb+2in4zSJC4cmUVJRzVfr9zgdxTSSzwqEiIQCLwDnAP2AKSJS90+iFUC6qg4CPgD+UmtfqaoO8XxN8lVOY/xFVXUND8/MJKltK247rZfTcZrMqO7t6RgXxcyVu5yOYhrJl1cQI4BsVd2iqhXANGBy7QNUdZ6qHvY8XQx08WEeY/zaW4u3s2F3Mb8/v69fTuV9vEJChIkDOrIwK59D5VVOxzGN4MsCkQTUHmef49l2NDcAn9d6HiUiGSKyWEQu9EE+Y/xGfnE5f/tqE2NSEzi7f0en4zS5cwZ0pKKqhnkb9jodxTSCXzRSi8jVQDrwVK3N3VQ1HbgSeEZEeno572ZPEcnIz89vprTGNL0nPl9PWWU1j0zqj4g4HafJpafEkxATwRdrdzsdxTSCLwtELpBc63kXz7afEJHxwAPAJFX9z9SPqprr+b4FmA8MrXuuqr6squmqmu5yBc48NcbUtmTrfj5ansvNY3vQ0xXjdByfCA0RJvTvyLyNeymrtN5MgcKXBWIpkCoi3UUkArgC+ElvJBEZCryEuzjsrbW9nYhEeh4nAKcA63yY1RhHVFbX8PuP15LUthVTT091Oo5PnTOgI4crqlmwya72A4XPCoSqVgFTgTnAemC6qmaKyKMicqRX0lNADPB+ne6sfYEMEVkFzAOeVFUrECbovPH9NjbuKeahC/oFVcO0N6N6tKdNq3C7zRRAwnz54qo6G5hdZ9tDtR6PP8p53wMDfZnNGKftLizj6a82cXpvFxP6BfaI6YYIDw3hrH4dmJO5m/KqaiLDgrsgBgO/aKQ2piV6/LN1VNZo0DZMe3PuwI4Ul1XxfbbNzRQIrEAY44DvsguYtTqP20/rGTQjphvilF4JxEaG2W2mAGEFwphmVl5Vze8/cU/lfeu4/+q9HdQiw0IZ19vF1xv2UmNTgPs9KxDGNLNXFm1lS34Jj0zqH9BTeR+v8X07UHConFU5B52OYuphBcKYZrRz/2H+8U0WZ/fvwOm9E52O44jTersIDRG+Xm+jqv2dFQhjmomqcv+MNYSK8PAF/Z2O45i2rSMY3q0dc212V79nBcKYZjJjRS6Lsgr43Tl96ByAa0w3pfF9E9mwu5icA4frP9g4xgqEMc2g4FA5j85ax7Cubbl6ZLf6Twh
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.kdeplot(quartet['residual'])"
]
},
{
"cell_type": "code",
"execution_count": 223,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.lines.Line2D at 0x21603410fc8>"
]
},
"execution_count": 223,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYoAAAEGCAYAAAB7DNKzAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAW3ElEQVR4nO3dfZQdd33f8fdnbcFiWQpUlmT8QNY+dUjsNhXOHvMQwiGxIbIDdniwgFMopqUiBApESQsESFLIA5yCS5sQQDUEFwigGlwMGAPmISSHmHgFIlgYsHAElsCSEAEpMltL7Ld/3CtlJa1G+3R37u6+X+fcs/O0d76je2c/mt/8ZiZVhSRJJzLQdgGSpP5mUEiSGhkUkqRGBoUkqZFBIUlqdGrbBfTCGWecUUNDQ22XIUnzxubNm79fVSsnmrcgg2JoaIiRkZG2y5CkeSPJt080z6YnSVIjg0KS1MigkCQ1MigkSY0MCklSowXZ60mS+snYWLF97wF27Rtl9fJBhlYsZWAgbZc1aQaFJPXQ2Fhxy9Z72bBpC6MHxxhcMsC169aw9qIz501Y2PQkST20fe+BIyEBMHpwjA2btrB974GWK5s8g0KSemjXvtEjIXHY6MExdu8fbamiqTMoJKmHVi8fZHDJ0X9qB5cMsGrZYEsVTZ1BIUk9NLRiKdeuW3MkLA6foxhasbTlyibPk9mS1EMDA2HtRWfysy/5JXbvH2XVMns9SZKOMTAQzl95OuevPL3tUqbFoJC0KMz3axmgvW0wKCQteAvhWoY2t8GT2ZIWvIVwLUOb22BQSFrwFsK1DG1ug0EhacFbCNcytLkNBoWkBW8hXMvQ5jakqnq+krk2PDxcPjNb0niHewzN12sZoLfbkGRzVQ1PNM9eT5IWhfl+LQO0tw02PUmSGrUaFEnemWR3kjtOMD9J/meSbUn+PsnFc12jJC12bR9RvAtY2zD/cuCC7ms98NY5qEmSNE6rQVFVnwd+0LDIVcD/ro7bgAcneejcVCdJgvaPKE7mbOCeceM7utOOk2R9kpEkI3v27JmT4iRpMej3oJi0qtpYVcNVNbxy5cq2y5GkBaPfg2IncO648XO60yRJc6Tfg+Im4N91ez89CvhRVX2v7aIkaTFp9YK7JO8DHg+ckWQH8PvAEoCqehtwM3AFsA24D3heO5VK0uLValBU1bNOMr+AF81ROZKkCfR705MkqWUGhSSpkUEhSWpkUEiSGhkUkqRGBoUkqZEPLpI0LYeftrZr3yirl8/PJ8YtFL3+LAwKSVM2NlbcsvVeNmzawujBsSPPb1570ZmGxRybi8/CpidJU7Z974Ejf5gARg+OsWHTFrbvPdByZYvPXHwWBoWkKdu1b/TIH6bDRg+OsXv/aEsVLV5z8VkYFJKmbPXyQQaXHP3nY3DJAKuWDbZU0eI1F5+FQSFpyoZWLOXadWuO/IE63C4+tGJpy5UtPnPxWaRz372FZXh4uEZGRtouQ1rQDve02b1/lFXL7PXUptn4LJJsrqrhiebZ60nStAwMhPNXns75K09vu5RFr9efhU1PkqRGBoUkqZFBIUlqZFBIkhoZFJKkRgaFJKmRQSFJamRQSJIatRoUSdYm+UaSbUleMcH8a5LsSbKl+3p+G3VK0mLW2pXZSU4B3gI8AdgB3J7kpqr62jGLfqCqXjznBUqSgHaPKC4BtlXV3VV1P/B+4KoW65EkTaDNoDgbuGfc+I7utGM9LcnfJ7khybknerMk65OMJBnZs2fPbNcqSYtWv5/M/ggwVFU/D3wKuP5EC1bVxqoarqrhlStXzlmBkrTQtRkUO4HxRwjndKcdUVV7q+r/dUevA35hjmqTJHW1GRS3AxckOS/JA4BnAjeNXyDJQ8eNXgncOYf1SZJosddTVR1K8mLgE8ApwDuramuS1wIjVXUT8JIkVwKHgB8A17RVryQtVj7hTpLU+IS7fj+ZLUlqmUEhSWpkUEiSGhkUkqRGBoUkqZFBIUlqZFBIkhoZFJKkRgaFJKmRQSFJamRQSJIaGRSSpEYGhSSpkUEhSWrU2vMopMVgbKzYvvcAu/aNsnr5IEMrljIwkLbL0jzV1vfJoJB6ZGysuGXrvWzYtIXRg2MMLhng2nVrWHvRmYaFpqzN75NNT1KPbN974MhODTB6cIwNm7awfe+BlivTfNTm98mgkHpk177RIzv1YaMHx9i9f7SlijSftfl9MiikHlm9fJDBJUfvYoNLBli1bLClijSftfl9MiikHhlasZRr1605snMfblMeWrG05co0H7X5fUpV9Xwlc214eLhGRkbaLkM60ktl9/5RVi2z15NmppffpySbq2p4onn2epJ6aGAgnL/ydM5feXrbpWgBaOv71GrTU5K1Sb6RZFuSV0ww/4FJPtCd/8UkQy2UKUmLWmtBkeQU4C3A5cCFwLOSXHjMYv8B+Meq+pfAfwfeMLdVSpLabHq6BNhWVXcDJHk/cBXwtXHLXAX8QXf4BuDPkqROdmLlG9+Axz/+6Gnr1sFv/ibcdx9cccXxv3PNNZ3X978PT3/68fNf+EJ4xjPgnnvgOc85fv5v/zY8+cmddb/gBcfPf/Wr4bLLYMsWeNnLjp//x38Mj3kMfOEL8Lu/e/z8N78Z1qyBW2+FP/zD4+e//e3w8IfDRz4Cb3rT8fPf/W4491z4wAfgrW89fv4NN8AZZ8C73tV5Hevmm+G00+DP/xw2bTp+/uc+1/n5xjfCRz969LwHPQg+/vHO8OteB5/+9NHzV6yAD36wM/zKV8Lf/u3R8885B97zns7wy17W+Tcc72d+BjZu7AyvXw/f/ObR89es6fz7ATz72bBjx9HzH/1o+JM/6Qw/7Wmwd+/R8y+9FF7zms7w5ZfDj3989PwnPQl+53c6w8d+78Dvnt+9znC/f/catNn0dDZwz7jxHd1pEy5TVYeAHwErJnqzJOuTjCQZOXjwYA/KlaTFqbVeT0meDqytqud3x58DPLKqXjxumTu6y+zojn+ru0xj/NnrSZKmpqnXU5tHFDuBc8eNn9OdNuEySU4Ffgo45thMktRLbQbF7cAFSc5L8gDgmcBNxyxzE/Dc7vDTgc+c9PyEJGlWNZ7MTrIfmOgPc4CqquXTXXFVHUryYuATwCnAO6tqa5LXAiNVdRPwDuDdSbYBP6ATJpKkOdQYFFW1rJcrr6qbgZuPmfZ744ZHgat7WYMkqdmUuscmWQUcuQNVVX1n1iuSJPWVSZ2jSHJlkruAfwD+CtgOfLyHdUmS+sRkT2a/DngU8M2qOg+4FLitZ1VJkvrGZIPiYFXtBQaSDFTVZ4EJ+9tKkhaWyZ6j+GGS04HPA+9NshvweY6StAhM9ojiKuDHwG8BtwDfAp7cq6IkSf1jUkcUVTX+6OH6HtUiSepDkwqKYy68ewCwBDgwkwvuJEnzw2SPKI5ceJckdJqiHtWroiRJ/WPK93qqjv8L/OrslyNJ6jeTbXp66rjRATpdY0d7UpEkqa9Mtnvs+B5Oh+hcmX3VrFcjSeo7kz1H8bxeFyJJ6k8nu834nzLxbcYBqKqXzHpFkqS+crKT2SPAZjp3jL0YuKv7WkOnm6wkaYE72fMorgdI8kLgsVV1qDv+NuCve1+eJKltk+0e+xBg/MV1p3enSZIWuMn2eno98OUkn6XzGNTHAX/Qq6IkSf1jsr2e/iLJx4FHdie9vKru7V1ZkqR+0dj0lORnuz8vBs4C7um+zupOkyQtcCc7otgArAfeNMG8An5l1iuSJPWVk/V6Wt/9+ctzU44kqd9MqtdTkquTLOsOvzrJh5I8YrorTfIvknwqyV3dnxP2oErykyRbuq+bprs+SdL0TbZ77Guqan+SxwKXAe8A3jaD9b4C+HRVXQB8ujs+kR9X1Zru68oZrE+SNE2TDYqfdH/+GrCxqj7GzK7Mvop/flLe9cCvz+C9JEk9NNmg2Jnk7cAzgJuTPHAKvzuR1VX1ve7wvcDqEyw3mGQkyW1Jfn0G65MkTdNkL7hbB6wF3lhVP0zyUOA/N/1CkluBMyeY9arxI1VVSU5048GfrqqdSc4HPpPkq1X1rROsbz2dHlo87GEPa94
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(data=quartet,x='y',y='residual')\n",
"plt.axhline(y=0, color='r', linestyle='--')"
]
},
{
"cell_type": "code",
"execution_count": 224,
"metadata": {},
"outputs": [],
"source": [
"quartet = pd.read_csv('anscombes_quartet4.csv')"
]
},
{
"cell_type": "code",
"execution_count": 225,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>x</th>\n",
" <th>y</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>8.0</td>\n",
" <td>6.58</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>8.0</td>\n",
" <td>5.76</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>8.0</td>\n",
" <td>7.71</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>8.0</td>\n",
" <td>8.84</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>8.0</td>\n",
" <td>8.47</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>8.0</td>\n",
" <td>7.04</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>8.0</td>\n",
" <td>5.25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>19.0</td>\n",
" <td>12.50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>8.0</td>\n",
" <td>5.56</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>8.0</td>\n",
" <td>7.91</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>8.0</td>\n",
" <td>6.89</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" x y\n",
"0 8.0 6.58\n",
"1 8.0 5.76\n",
"2 8.0 7.71\n",
"3 8.0 8.84\n",
"4 8.0 8.47\n",
"5 8.0 7.04\n",
"6 8.0 5.25\n",
"7 19.0 12.50\n",
"8 8.0 5.56\n",
"9 8.0 7.91\n",
"10 8.0 6.89"
]
},
"execution_count": 225,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"quartet"
]
},
{
"cell_type": "code",
"execution_count": 226,
"metadata": {},
"outputs": [],
"source": [
"# y = 3.00 + 0.500x\n",
"quartet['pred_y'] = 3 + 0.5 * quartet['x']"
]
},
{
"cell_type": "code",
"execution_count": 227,
"metadata": {},
"outputs": [],
"source": [
"quartet['residual'] = quartet['y'] - quartet['pred_y']"
]
},
{
"cell_type": "code",
"execution_count": 228,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.collections.LineCollection at 0x216035bf808>"
]
},
"execution_count": 228,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEGCAYAAABiq/5QAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAgN0lEQVR4nO3deXiU1fn/8fcdCAaQTQiIWwPu4oIardpS6vpDq6K1ImqrVBTFBQF3BWnZtC5o+QIqoqJVUUoFt0rBXVs3VLQgKAJRQZaIKJuBJHP//jiJBSSsM/PMPPN5XZdXMpNhzv1c6ocnZ865j7k7IiKSO/KiLkBERNJLwS8ikmMU/CIiOUbBLyKSYxT8IiI5pnbUBWyOZs2aeVFRUdRliIhklffff/8bdy9c//msCP6ioiKmTJkSdRkiIlnFzL7Y0POa6hERyTEKfhGRHKPgFxHJMQp+EZEco+AXEckxWbGqR0Qk1yQSTsmSlSxaVkaLhgUUNa1PXp4l5b0V/CIiGSaRcCZOX0jvsVMpK09QkJ/HkE5t6dBmx6SEv6Z6REQyTMmSldx2/2SumXgfDctWUFaeoPfYqZQsWZmU91fwi4hkkpIS6l15OZOGd+UPHzzPYfOmA1BWnmDx8rKkDKGpHhGRTDBnDgweDA8/TIu8PMYc/P8YftgZzG/UHICC/DyaNyhIylC64xcRidKsWdClC+y1Fzz6KHTvjs/6nMYP3c+SZjsC/DjHX9S0flKG1B2/iEgUZs6EQYPg8cehTh244gq49lpo2ZI8oEPC2adHOxYvL6N5A63qERHJXtOnw8CB8OSTULcu9O4NV10FO+64zsvy8ozWhdvTunD7pJeg4BcRSYePP4YBA2DcONh+e7juuhD6hT/pmpxyCn4RkVT68EPo3x8mTICGDaFPH+jZE5o2jawkBb+ISCq89164w3/2WWjUCPr1gyuvhCZNoq5MwS8iklRvvRUC/4UXQsgPGBA+uG3UKOrKfqTgFxFJhjffDFM6kyeHaZxbboFLLw3TOxlGwS8isi1efTUE/iuvQPPmcPvtcMkl4QPcDJWyDVxm9qCZLTazaWs9d7uZzTSzj81svJk1TtX4IiIp4w4vvQTt28PRR8OMGTBkCMydC1dfndGhD6nduTsa6LDec5OB/d39QOAz4IYUji8iklzu8K9/wS9/CccdB59/DkOHhnYLvXpBvXpRV7hZUhb87v468O16z01y94qqh28Du6RqfBGRpHGH55+HI46ADh3gq69gxAiYPTt8cFu3btQVbpEoe/VcALxQ0w/NrJuZTTGzKaWlpWksS0Skijs8/TQUF8PJJ8PixTByZLjT794dCpLTNC3dIgl+M7sJqAAeq+k17j7S3Yvdvbgwgp1tIpLDEgl46ik4+GA47TT47jt44AH47DO46KLQWyeLpT34zawLcDJwrrt7uscXEalRZSWMHQsHHQRnnAGrVsHDD8Onn8IFF0B+ftQVJkVag9/MOgDXAqe6+6p0ji0iUqPKytAl84AD4KyzoKICHnssrNY57zyoHa+V76lczjkGeAvY28zmmVlXYBjQAJhsZlPN7N5UjS8iskkVFfDII7DffnDuuZCXF7pmTpsG55wDtWpFXWFKpOyvMXc/ewNPP5Cq8URENlt5eTj0ZNCgsDLnwAND18zTTw/hH3Px+v1FRGRj1qwJc/aDB0NJCRxySOiaecopORH41XLnSkUkd61eDffcA3vsAd26hR74zz0HU6ZAx445FfqgO34RibOyMhg1Cm69FebPhyOPhPvvhxNOAEvOMYbZSMEvIvGzalXYaHXbbbBgQWixMHo0HHtsTgd+NQW/iMTHypVhSuf228Mu26OPDss027dX4K9FwS8i2W/5chg+HO68E775JjRQu/lmaNcu6soykoJfRLLX99/DsGGhJfK334YGan37wlFHRV1ZRlPwi0j2Wbo0tEO+++7QR+fkk0PgH3541JVlBQW/iGSPJUtC2A8dCsuWhQZqffuG9fiy2RT8IpL5SkvDdM6wYbBiBfzud9CnT2imJltMwS8imWvRovCB7YgRYYlmp04h8PffP+rKspqCX0Qyz4IFYUnmvfeGXbdnnw033QT77ht1ZbGg4BeRzDFvXth0NXJk6Jz5+9/DjTfCXntFXVmsKPhFJHpffhnaKjzwQDj96rzz4IYbQm8dSToFv4hEp6QkdMocPTo8/uMf4frroVWrKKuKPQW/iKTf7Nkh8B95JHTGvOgiuO462G23qCvLCQp+EUmfzz4Lh5889lg4v/bSS+Haa2HnnaOuLKco+EUk9WbMCIE/Zgxstx306AHXXAMtW0ZdWU5K5Zm7D5rZYjObttZzZ5rZdDNLmFlxqsYWkQwxbRp07gxt2sD48XDVVTB3btiMpdCPTCqPnRkNdFjvuWnAb4HXUziuiETto4/C7toDDoDnnw8f2JaUhKWaLVpEXV3OS+Vh66+bWdF6z80AMPXFFomn99+HAQPg6aehYcPQR6dnT9hhh6grk7Vk7By/mXUDugHspk/6RTLbu+9C//7h7r5xY/jTn8I8fpMmUVcmG5CxJwy7+0h3L3b34sLCwqjLEZENeestOPFE+PnPw/cDB4YpnX79FPoZLGPv+EUkg73xRrjDf/FFaNYs7Lq99FJo0CDqymQzKPhFZPO4w6uvhsB/9VVo3hzuuAMuuQTq14+6OtkCqVzOOQZ4C9jbzOaZWVczO93M5gFHAs+b2b9SNb6IJIl7uLNv3x6OOQZmzoS77grLMq+6SqGfhVK5qufsGn40PlVjikgSucO//hXu8N96K+yu/b//g65doW7dqKuTbZCxH+6KSETc4bnnwge2J54I8+fDPfeE/jqXX67QjwEFv4gE7jBhAhx6KJxySjju8P77YdasMI+/3XZRVyhJouAXyXWJBPzjH3DwwXD66eEQ8wcfDA3VLrwQ6tSJukJJMgW/SK6qrIQnn4QDDwztFX74IbRJnjkz9MXPz4+6QkkRBb9IrqmoCG2R998/NFBLJODxx+GTT+APf4DaWuUddwp+kVxRUQEPPwz77RfOsq1dG8aODR00zz4batWKukJJEwW/SNyVl4c5+733hi5doF69MKf/0Udw5pnhBCzJKfqdTiSu1qwJZ9neckvon3PooaFr5imngDrk5jT9VS8SN2VlMGIE7LEHXHxxaK3w/PPw3ntw6qkKfdEdv0hs/PBDWHf/l7/A11/DUUfBqFFw/PEKe1mHgl8k261aBffdF063WrgQ2rULyzKPOUaBLxuk4BfJVitWhFYKd9wBixeHoH/iidBMTWQjFPwi2WbZMhg+HO68E5YsCVM5N98Mv/xl1JVJllDwi2SL774L3THvuguWLg0N1Pr2hSOPjLoyyTIKfpFMt3Qp/PWvcPfd8P33YTlm375w2GFRVyZZSsEvkqmWLAl390OHwvLloYFa376hmZrINlDwi2Sa0tIwfz98OKxcGRqo9ekTmqmJJIGCXyRTLFoUVuiMGBHW5J91Vgj8Nm2irkxiJpVn7j5oZovNbNpaz+1gZpPNbFbV1yapGl8ka3z9NfTqBUVFMGQI/Pa3oVPmmDEKfUmJVLZsGA10WO+564GX3H1P4KWqxyK5ad48uOIKaN06rNbp3Dn0wv/b32CffaKuTmIsZcHv7q8D3673dEfg4arvHwZOS9X4Ihnriy+ge3fYfXe4997QIvnTT+Ghh2DPPaOuTnJAuuf4W7j7gqrvFwItanqhmXUDugHstttuaShNJMXmzoXBg0PHTDO44AK4/vowxSOSRpF153R3B3wjPx/p7sXuXlxYWJjGykSS7PPPQ8jvuWfooXPxxTB7drjbV+hLBNJ9x7/IzFq6+wIzawksTvP4Iunz6acwaFA45rBOHbj8crjmGth556grkxyX7jv+Z4Dzq74/H3g6zeOLpN4nn8A558C++8K4cdCzZ5jmuftuhb5khJTd8ZvZGODXQDMzmwf0A24FxppZV+ALoFOqxhdJu//+FwYOhL//PRxveM01cNVV4SAUkQySsuB397Nr+NGxqRpTJBJTp8KAAfDUU9CgAdxwQ1iX36xZ1JWJbJB27opsrSl
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(data=quartet,x='x',y='y')\n",
"sns.lineplot(data=quartet,x='x',y='pred_y',color='red')\n",
"plt.vlines(quartet['x'],quartet['y'],quartet['y']-quartet['residual'])"
]
},
{
"cell_type": "code",
"execution_count": 229,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:xlabel='residual', ylabel='Density'>"
]
},
"execution_count": 229,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAEGCAYAAAB/+QKOAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAwtUlEQVR4nO3dd3zV5d3/8dcnO4EEyIIQIAmbyCbsUVGhqAhqQUFxW22rtlV739XaqrXedx0d9nb81Kp1oALO4kREnAwTQPYKEEgChLBC9vz8/jiHNtIDBMjJ94zP8/E4j+R8x8k7kJNPru91fa9LVBVjjDHmWCFOBzDGGOObrEAYY4zxyAqEMcYYj6xAGGOM8cgKhDHGGI/CnA7QXBITEzU9Pd3pGMYY41dWrFixX1WTPO0LmAKRnp5OTk6O0zGMMcaviMjO4+2zS0zGGGM8sgJhjDHGIysQxhhjPLICYYwxxiMrEMYYYzyyAmGMMcYjKxDGGGM8sgJhjJepKuXVddTUNTgdxZhTEjA3yhnjKw6V17BwYxFfbClmfWEJBYcqqWtwrbsSGxlGt+TWDMuI55zeyQxLjyckRBxObIxnViCMaSa5+0p5avE23l+zh5r6BjrERTGoS1su6JdCXHQ4dfUN7CutZtPeUl78Jo9nv9xOl/gYrh+dzoxhXYgKD3X6WzDme6xAGHOGDlfU8PDHm5ibnU9UeCgzhnXmsqzOnNUxDhHPrYPy6joWbihi9rKd3P/eBp75cju/m5zJ+X07HPccY1qaBMqSo1lZWWpzMZmW9sn6vdz99loOV9Zy7ah0fnZ2NxJaRzb5fFVl6bYDPPjBRjbsOcKEzPY8/KP+xLeK8GJqY/5NRFaoapbHfVYgjDl1dfUNPPjBRl5ckkff1DgenTaAPilxZ/R6//gmj0cXbKZtTDjPXZNF/05tmy+wMcdxogJho5iMOUWlVbXc8FIOLy7J4/rRGbz909FnVBwAwkJD+PG4rrx7y2giwkK47JmlfLxubzMlNub0WIEw5hTsPlzJ9KeX8nXufv54aT/uvSiTiLDmextldozjnZ+5Cs5PX13BM19sI1Ba+cb/WIEwpol2HijnR/9vCYWHKnnxuqHMHNbFK18nKTaS1388ggv6pfDHjzbxt0VbvfJ1jDkZG8VkTBPkH6xg5rPLqKytZ+7NI8nseGaXlE4mKjyUx2cMIjo8lMc+3UpEWAg/O7u7V7+mMceyAmHMSRQermTm35dRXlPPqzcO93pxOCokRHj4R/2prW/gkY83ExEawo1ju7bI1zYGrEAYc0L7y6q54u/LKKms5bUbR9A3tU2Lfv3QEOHP0wdQ6x41Fd8qgksHd2rRDCZ4WR+EMcdRWVPPDS/lUHSkipeuH0a/Ti1bHI4KCw3hscsHMbJrAne9tZbsvIOO5DDBxwqEMR7UNyg/n7OKNQWH+b8ZgxjcpZ2jeSLCQnh61hA6tYvmppdz2Hmg3NE8JjhYgTDGgz+8v4GFG4q4b3ImE8/q4HQcANrEhPPCtUNR4PoXsymprHU6kglwViCMOcbr3+76101w147OcDrO96QntuKZWUPYeaCC/3pjtd0jYbzKCoQxjazcdYj7/rmesT0SuefCPk7H8Wh41wTuOr83n2wo4vmvdzgdxwQwKxDGuO0rreKns1fQvk0kj88cRKgPr9Nww5gMJp3VgT9+tIkc67Q2XuLVAiEik0Rks4jkishdHvbfISIbRGSNiCwSkbRG++pF5Dv3Y743cxpTU9fALa+upKSylmdmZdE2xrdnUxURHpnen87tornltZUcKKt2OpIJQF4rECISCjwJnA9kAjNFJPOYw1YBWaraH3gTeKTRvkpVHeh+TPFWTmMAHvxgA9l5h3j4R/1b7Ea4MxUXFc5TVw7hUEUtv35rrfVHmGbnzRbEMCBXVberag0wB5ja+ABVXayqFe6nywC7A8i0uDdXFPDy0p38eGwGUwemOh3nlGR2jOPXk3rz6cYiXv823+k4JsB4s0CkAo1/Ygvc247nBuCjRs+jRCRHRJaJyMWeThCRm9zH5BQXF59xYBN8thSV8tt31zKyawK/ntTb6Tin5bpR6Yztkcgf3t/A9uIyp+OYAOITndQiMgvIAh5ttDnNvYjFFcBjItLt2PNU9VlVzVLVrKSkpBZKawJFRU0dt7y6ktaRYfxt5kDCQn3i7XDKQkKEP00fQGR4CLfP/Y7a+ganI5kA4c13RCHQudHzTu5t3yMi5wH3AFNU9V89bapa6P64HfgcGOTFrCYI3T9/PbnFZTx2+SCSY6OcjnNG2sdF8dCl/VhdUMJTi7c5HccECG8WiGygh4hkiEgEMAP43mgkERkEPIOrOOxrtL2diES6P08ERgMbvJjVBJl3VhUwL6eAW87uzpgeiU7HaRaT+qYwdWBHnli8lc17S52OYwKA1wqEqtYBtwILgI3APFVdLyIPiMjRUUmPAq2BN44ZztoHyBGR1cBi4CFVtQJhmkXuvjLueWcdw9Lj+eV5PZyO06zuu+gs4qLC+e83V1Nnl5rMGZJAGRqXlZWlOTk5TscwPq6qtp6Ln/yGoiNVfPiLsaS0iXY6UrN7b/Vubnt9Fb+5oDc3jfuPrjtjvkdEVrj7e/+Df/bKGXOaHnh/A5v2lvKXywcGZHEAmNw/hQmZ7fnzJ1vYsd9mfTWnzwqECRrvrd7Na8t3cfMPujK+V7LTcbxGRHjw4r5EhIXw67fW0NAQGFcJTMuzAmGCQt7+cu5+ey2Du7TlVxN7OR3H69rHRfG7CzP5dsdBXl2+0+k4xk9ZgTABr7qunlteW0loiPD4FYMJ99P7HU7V9KxOjO2RyEMfbWJvSZXTcYwfCo53iglqD320ifW7j/Cn6QNIbRuY/Q6eiAj/c3E/6hqUBz+wQYDm1FmBMAFt0cYi/vFNHteOSmdCZnun47S4Lgkx3DK+O++v2cNXW206GnNqrECYgFV0pIr/enMNfVLiuOt8/5xnqTncNK4r6Qkx3PvP9VTX1Tsdx/gRKxAmINU3KL+c8x2VNfU8PnMQUeGhTkdyTFR4KA9M7cuO/eU8+8V2p+MYP2IFwgSkp7/YxtLtB7h/Sibdk1s7Hcdx43omcWG/FJ5YnEv+wYqTn2AMViBMAFq56xB/WbiFyf1TuCyr88lPCBK/m5xJWIhw3/z1triQaRIrECagHKmq5eevryKlTRT/c0k/RHx3XemW1qFNFLdP6Mlnm/axcEOR03GMH7ACYQKGqvKbt9eyp6SKv80YRJvocKcj+ZxrRqXTq30sD36w0TqszUlZgTAB442cAt5fs4c7JvRkSFo7p+P4pPDQEH43OZNdByt44es8p+MYH2cFwgSE3H1l3Dd/PaO6JfCTH9gMpicypkci5/VpzxOfbWVfqd1hbY7PCoTxe1W19dz2+iqiwkP46+UDCQ2xfoeTuefCPtTUN/CnBZudjmJ8mBUI4/ce/ngTG/e4ptJoH+ffS4e2lIzEVlw7Kp03VhSwrrDE6TjGR1mBMH7t6FQa141O59w+wTeVxpm47dwexMdE8MB7G2zYq/HICoTxW0VHqvjVG6vJDPKpNE5XXFQ4d07sxbd5B/lw7V6n4xgfZAXC+KWGBuX2ud9RVdvA41cMIjIseKfSOBOXD+1M7w6x/O+HG6mqtWGv5vusQBi/9PzXO1iyzTWVRrckm0rjdIWGCPdOzqTwcCUvLclzOo7xMVYgjN/ZuOcIjy7YzA/Pam9TaTSDUd0TObtXEk8uzuVwRY3TcYwPsQJh/EpVbT2/nPMdbWLC+eOl/W0qjWby60m9Ka2u46nPtzkdxfgQKxDGr/xpwWY2F5XyyLT+xLeKcDpOwOiTEselgzrx4pI8Cg9XOh3H+AgrEMZvLMndz3Nf7+DqkWmM75XsdJyAc8fEngD8+RO7ec64WIEwfqGkopY731hNt6RW3H1+H6fjBKTUttFcNyqdd1YVsmH3EafjGB9gBcL4hfvfW09xaTWPXT6I6Agb0uotPzu7O3FR4Tz88SanoxgfYAXC+LxFG4t4Z1Uht4zvTr9ObZyOE9DaxIRzy/hufLGlmCW5+52OYxxmBcL4tJLKWn7
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.kdeplot(quartet['residual'])"
]
},
{
"cell_type": "code",
"execution_count": 230,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.lines.Line2D at 0x21603641688>"
]
},
"execution_count": 230,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYoAAAEJCAYAAACKWmBmAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAYQ0lEQVR4nO3df5TcdX3v8edryMJCfggumwQIcckthRKvBtyDYMFDC62Bi+T6gwBHveDVblE5SKO3FX/0h7bVntq01VokRS6gFs1F0agx/CoWW8HrRlchpEhIF5NAsmEREoKrG+bdP+a7y2Z39ruT3Z35zM68HufMme+vzLzOZJPXfr+f73y/igjMzMzGU0gdwMzM6puLwszMcrkozMwsl4vCzMxyuSjMzCyXi8LMzHIlKwpJx0u6V9LDkjZJem+ZbSTpU5K2SPqJpNNSZDUza2azEr73fuB9EfFDSXOBjZLuioiHR2xzPnBi9ng1cF32bGZmNZKsKCLiSeDJbHqvpM3AccDIolgB3BKlbwU+IOlIScdkf3ZcRx99dHR0dFQpuZlZ49m4ceNTEdFebl3KPYphkjqAU4Hvj1p1HLBtxPz2bNmYopDUBXQBLF68mO7u7qpkNTNrRJIeH29d8sFsSXOArwDXRMSeyb5ORKyJiM6I6GxvL1uKZmY2CUmLQlILpZL4YkR8tcwmO4DjR8wvypaZmVmNpDzrScDngM0RsXqczdYB/ys7++kM4NmJxifMzGx6pRyj+E3gbcCDknqyZR8EFgNExGeB9cAFwBbgeeDttY9pZtbcUp719G+AJtgmgPfUJpGZmZVTF2c9mTWyYjHo7d/Hrj0DLJjXSkfbbAqF3N+RzOqKi8KsiorFYMOmnaxa28PAYJHWlgKrVy5j+dKFLgubMZKfHmvWyHr79w2XBMDAYJFVa3vo7d+XOJlZ5VwUZlW0a8/AcEkMGRgs0rd3IFEis4PnojCrogXzWmltOfCfWWtLgflzWxMlMjt4LgqzKupom83qlcuGy2JojKKjbXbiZGaV82C2WRUVCmL50oWcfPXZ9O0dYP5cn/VkM4+LwqzKCgWxpH0OS9rnpI5iNik+9GRmZrlcFGZmlstFYWZmuVwUZmaWy0VhZma5XBRmZpbLRWFmZrlcFGZmlstFYWZmuVwUZmaWy0VhZma5khaFpBsl9Ul6aJz150h6VlJP9vjjWmc0M2t2qS8KeBPwD8AtOdt8NyIurE0cMzMbLWlRRMR9kjpSZjCbSLEY9PbvY9eeARbM82XCrfmk3qOoxJmSfgw8Abw/IjaV20hSF9AFsHjx4hrGs0ZWLAYbNu0cvu/10I2Hli9d6LKwplHvg9k/BF4WEa8EPg18bbwNI2JNRHRGRGd7e3ut8lmD6+3fN1wSULrf9aq1PfT270uczKx26rooImJPRDyXTa8HWiQdnTiWNZFdewaGS2LIwGCRvr0DiRKZ1V5dF4WkhZKUTZ9OKW9/2lTWTBbMax2+3/WQ1pYC8+e2JkpkVnupT4+9FbgfOEnSdknvkHSlpCuzTd4MPJSNUXwKuDQiIlVeaz4dbbNZvXLZcFkMjVF0tM1OnMysdtSI/+92dnZGd3d36hjWIIbOeurbO8D8uT7ryRqTpI0R0Vlu3Uw468ksqUJBLGmfw5L2OamjmCVR12MUZmaWnovCzMxyuSjMzCyXi8LMzHK5KMzMLJeLwszMcrkozMwsl4vCzMxyuSjMzCyXi8LMzHK5KMzMLJeLwszMcrkozMwsl4vCzMxyuSjMzCyXi8LMzHK5KMzMLFfqe2bfKKlP0kPjrJekT0naIuknkk6rdUYzs2aXeo/iJmB5zvrzgROzRxdwXQ0ymZnZCEmLIiLuA57O2WQFcEuUPAAcKemY2qQzMzNIv0cxkeOAbSPmt2fLxpDUJalbUvfu3btrEs7MrBnUe1FULCLWRERnRHS2t7enjmNm1jBmpQ4wgR3A8SPmF2XLzIYVi0Fv/z527RlgwbxWOtpmUygodSyzhlHvRbEOuErSl4BXA89GxJOJM1kdKRaDDZt2smptDwODRVpbCqxeuYzlSxe6LMymSerTY28F7gdOkrRd0jskXSnpymyT9cBWYAvwT8C7E0W1OtXbv2+4JAAGBousWttDb/++xMnMGkfSPYqIuGyC9QG8p0ZxbAbatWdguCSGDAwW6ds7wJL2OYlSmTWWhhnMtua0YF4rrS0H/hi3thSYP7c1USKzxuOisBmto202q1cuGy6LoTGKjrbZiZOZNY56H8w2y1UoiOVLF3Ly1WfTt3eA+XN91pPZdHNR2IxXKIgl7XM8JmFWJT70ZGZmuVwUZmaWy0VhZma5PEZhdc+X6DBLy0Vhdc2X6DBLz4eerK75Eh1m6bkorK7lXaLDzGrDRWF1zZfoMEvPRWF1zZfoMEvPg9lW13yJDrP0XBRW93yJDrO0fOjJzMxyuSjMzCyXi8LMzHK5KMzMLFfSopC0XNIjkrZI+kCZ9VdI2i2pJ3u8M0VOM7NmluysJ0mHAJ8BfgfYDvxA0rqIeHjUpl+OiKtqHtDMzIC0exSnA1siYmtE/Ar4ErAiYR4zMysjZVEcB2wbMb89WzbamyT9RNJtko4f78UkdUnqltS9e/fu6c5qZta06n0w+xtAR0S8ArgLuHm8DSNiTUR0RkRne3t7zQKamTW6lEWxAxi5h7AoWzYsIvoj4pfZ7A3Aq2qUzczMMimL4gfAiZJOkHQocCmwbuQGko4ZMXsRsLmG+czMjIRnPUXEfklXAXcAhwA3RsQmSR8FuiNiHXC1pIuA/cDTwBWp8pqZNStFROoM066zszO6u7tTxzAzmzEkbYyIznLr6n0w28zMEnNRmJlZLheFmZnlyh3MlrQXKDeIISAiYl5VUpmZWd3ILYqImFurIGZmVp8O6vRYSfOB1qH5iPjZtCcyM7O6UtEYhaSLJD0K/Cfwr0Av8O0q5jIzszpR6WD2x4AzgJ9GxAnAucADVUtlZmZ1o9KiGIyIfqAgqRAR9wJlv5hhZmaNpdIximckzQHuA74oqQ/YV71YZmZWLyrdo1gB/AL4A2AD8Bjw+mqFMjOz+lHRHkVEjNx7GPeeEGZm1ngqKopRX7w7FGgB9vkLd2Zmja/SPYrhL95JEqVDUWdUK5SZmdWPg77WU5R8DXjd9McxM7N6U+mhpzeOmC1QOjV2oCqJzMysrlR6euzIM5z2U/pm9oppT2MzQrEY9PbvY9eeARbMa6WjbTaFglLHMrMqqXSM4u3VDmIzQ7EYbNi0k1VrexgYLNLaUmD1ymUsX7rQZWHWoCa6zPinKX+ZcQAi4uqpvLmk5cDfU7pn9g0R8YlR6w8DbgFeBfQDl0RE71Te06amt3/fcEkADAwWWbW2h5OvPpsl7XMSpzOzaphoMLsb2EjpirGnAY9mj2WUTpOdNEmHAJ8BzgdOAS6TdMqozd4B/Dwifg34W+CvpvKeNnW79gwMl8SQgcEifXs9ZGWWSrEYbN39HPc/9hRbdz9HsTju7/eTMtH9KG4GkPQu4KyI2J/Nfxb47hTf+3RgS0RszV7zS5TGPR4esc0K4E+z6duAf5CkiJjeT8EqtmBeK60thQPKorWlwPy5rTl/ysyqpRaHgysdzD4KmAc8nc3PyZZNxXHAthHz24FXj7dNROyX9CzQBjyV+8qPPALnnHPgspUr4d3vhuefhwsuGPtnrrii9HjqKXjzm8euf9e74JJLYNs2eNvbxq5/3/vg9a8vvffv//7Y9R/+MJx3HvT0wDXXjF3/l38Jr3kNfO978MEPjl3/d38Hy5bB3XfDn//52PXXXw8nnQTf+Ab8zd+MXf/5z8Pxx8OXvwzXXTd2/W23wdFHw003lR6jrV8PRxxBx9qb+fcNX+SxvucoRlCQ+G/z53DUn/17abtPfhK++c0D/+zhh8O3s6vSf+xjcM89B65va4OvfKU0fe21cP/9B65ftAi+8IXS9DXXlD7DkX7912HNmtJ0Vxf89KcHrl+2rPT5Abz1rbB9+4HrzzwTPv7x0vSb3gT9/QeuP/dc+MhHStPnnw+/+MWB6y+8EN7//tL06J878M/eNP3s8Y/
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(data=quartet,x='y',y='residual')\n",
"plt.axhline(y=0, color='r', linestyle='--')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Plotting Residuals\n",
"\n",
"It's also important to plot out residuals and check for normal distribution, this helps us understand if Linear Regression was a valid model choice."
]
},
{
"cell_type": "code",
"execution_count": 231,
"metadata": {},
"outputs": [],
"source": [
"# Predictions on training and testing sets\n",
"# Doing residuals separately will alert us to any issue with the split call\n",
"test_predictions = model.predict(X_test)"
]
},
{
"cell_type": "code",
"execution_count": 232,
"metadata": {},
"outputs": [],
"source": [
"# If our model was perfect, these would all be zeros\n",
"test_res = y_test - test_predictions"
]
},
{
"cell_type": "code",
"execution_count": 233,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.lines.Line2D at 0x216036b5308>"
]
},
"execution_count": 233,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYAAAAEGCAYAAABsLkJ6AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAaF0lEQVR4nO3de5xcZX3H8e9vSTQhIQGTQAIkhMhNE0LEfSFYQG7aEKCAYCitVNR2vdGQCopcvFQuSrmUplZKqBblooBcFAgoIJZYkbKhuYIEiQkkJrAJsmEJKUnm1z9mJuxlZnZ295x5zuXzfr3yysycndnnzOyc33l+z/P8jrm7AAD50xS6AQCAMAgAAJBTBAAAyCkCAADkFAEAAHJqUOgG9MXo0aN94sSJoZsBAKmyYMGC9e4+pvvjqQoAEydOVGtra+hmAECqmNmqSo+TAgKAnCIAAEBOEQAAIKcIAACQUwQAAMipVM0CAoAoFAqulRve0MsbN2u3EUM0cdQwNTVZ6GY1HAEAQK4UCq6Hlq3TF+9YqM1bChoyuEnXzpym6ZPH5i4IkAICkCsrN7yx/eAvSZu3FPTFOxZq5YY3Ares8QgAAHLl5Y2btx/8yzZvKeiV1zcHalFthYJrRVuHnnhhvVa0dahQiO4aLqSAAOTKbiOGaMjgpi5BYMjgJu2605CAraos7nQVPQAAuTJx1DBdO3OahgwuHv7KB9WJo4YFbllPcaer6AEAyJWmJtP0yWN1wKwj9Mrrm7XrTsmdBVQrXTVpzPABvz4BAEDuNDWZJo0ZHslBNE5xp6tIAQFAQsWdrqIHAAAJFXe6igAAABGJY4VxnOkqAgAARCCNK4wZAwCACKRxhXGwAGBm483sMTN7xsyWmdm5odoCAAOVthXGUtgU0FZJ57n702a2k6QFZvawuz8TsE0A0C9pWmFcFqwH4O5r3f3p0u3XJT0raY9Q7QGAgUjTCuOyRAwCm9lESe+T9GTgpgBAv6RphXFZ8ABgZsMl3SVptrtvrLC9RVKLJE2YMKHBrQOA+qVlhXFZ0FlAZjZYxYP/re5+d6Wfcfe57t7s7s1jxoxpbAMBIMOC9QDMzCR9T9Kz7n5tqHagPlxCD8iekCmgP5N0lqQlZraw9NhF7j4vXJNQydatBT2wdK0uuGtxaha4AOhdsADg7r+WxNEj4QoF129WbNh+8JfeXuBywKwjUpPrBNATK4FR08oNb6h11aupW+ACoHcEANT08sbNKri2z20uS/oCFwC9IwCgpt1GDNF9i9Zo1jH7dlngcuVpUxO9wAVA74KvA0CyTRw1TBdMf4+ufOhZnXP0Ptp1p3dq7MghGr/LjqGbBmCACACoqby68b3jdtLTL76mi+5ZwkwgICNIAaFXTU2mgmv7wV9KR6lbZFeh4FrR1qEnXlivFW0dKhQ8dJNSiR4A6lKr1C1TQdFIabzwSlLRA0BdyqVuO2MmEEJI44VXkooAgLqksdQtsimNF15JKlJAqEsaS90im9J44ZWkogeAPnPG2xAQvdHo0ANAXRh4Q1LQG40OPQDUhYE3JEn5wiuHThqtSWOGR3rwz9MUU3oAqEt/poFyDQGkTd56ugQAVNT94L3rTn0beMvbFwnZUK2nm9XS56SA0EP54D1jznydeeOTmjFnvv6woaNPA2+kjJBGeZtiSg8APVQ6eJ9z2//qoXOP0Lw6B95YOYw0ytsUU3oA6KHawXvdxs11D7zVu3I4TwNuSL68TTGlB4AeojgLKn+Ruo8BdP4iVRonuPK0qTphyjgNGsS5CRovb1NMzVO0qqe5udlbW1tDNyPzohrALQ8kV/sirWjr0Iw583sEmrlnNevwfUZn9ksHNJqZLXD35u6P0wNAD1GdBZXnalfK+ZeDQ6VUU+uqV7XnLkMZKwBiRgBARbUO3gNV7mE8t25jxVTTtoIYLM4Z1oyEQaIVDVeeZXRH62p99cT3dhlwm3XMvrp/8ZrMzrpAT5WmHT+0bB0TAhqAHgAarjzLaG37Zt3+Py/q2pnT9Lt1G7WtIN3e+qIumP6ezM66QE95W3yVJEF7AGb2fTN7xcyWhmwHGqvzFNHFazbq0vufkSQ177Wz/vPsQ1gtnDN5W3yVJKFTQDdJmh64DWiw7nOt/7TpLR0wdoQ+tN+ukRf2QvJxtblwgqaA3P1xM5sYsg1ovLzNtUZt9awZQTyCrwMoBYD73X1Kle0tklokacKECe9ftWpVA1sHoBF6WzOCgUntOgB3nytprlRcCBa4OQA6iWr6ZpzTjlFd4gMAosNca0SJkt/pF3oQGA3CXGtEjZLf6Rd6GuiPJD0haX8zW21mnw7Znizjy4p61Vuhlemb6Rd6FtCZIX9/nqSxPj8pq8bp/F5v3ea65KdLtGrDmzXTOnmrnZ9FpIByIm1zrUlZNU739/rvbm7VGc0TNG7kkJo9xbzVzs8iAkBOpO3LSsqqcSq913N++bw+evCe2+9XSuuU13PMm3WEftzyAc2bdQQDwCnDLKCcSNviqzSmrNKq2nttpT+NWj1Fpm+mGwEgR9L0ZSW/3DjV3mv35PcUMTCkgJBIaUtZpVml9/rK06bq6P1Hk9bJuOClIPqCS0LmC+UBGof3OttSWwoC6bF1a0HL1rZrbftmjRs5VJPHjRjQxd3TlLJKO97rfCIAIBJbtxZ076I1uuTepdvLAlx2yhSdctAeAwoCAOLDNxORWLa2ffvBXyrOIrnk3qVatrY9cMsAVEMAQCTWtleeSriuvf9lAeotSQCgf0gBIRLjRg6tOJVw7Mj+Tduk0iQQP3oAiMTkcSN02SlTukwlvOyUKZo8bmS/Xo+VwED86AEgEoMGNemUg/bQvrsO17r2zRo7cogmjxvZ7wFgVgInC4X5sokAgF7V++UfNKhJB43fRQeNH/jvZCVwcpCOyy5SQKgpVFVOVgInB+m47KIHgJqqffkPmHVErKmYtBWvyzLScdlFDwA1hbrqU+e0Ewf/sNJ2LQnUjwCAmkJ8+UNeDIa1Bz1VSsddceqBajLx/qQcxeBQU4gBwBVtHZoxZ36PAeB5FdJOUc5OYbCzukLB9Yf1b+jZdRu1/OXXdWfrav1p01u8PylBMTj0S2+5+DimB9abc476gB1qvCMNmppMZtL5dy7q8tnw/qQbKSD0qlwp8tBJozVpzPAuB/84UjX1pp2inp0SarwjLXh/socAgH6La3pgvVNAoz4gMdhZG+9P9hAA0G9xnRHWe7HxqA9IrD2ojfcne4KOAZjZdEn/ImkHSf/h7t8O2R70TZyrdeu5QEn5gNR9DKC/ByTWHtTG+5M9wWYBmdkOkpZL+rCk1ZKeknSmuz9T7TnMAkqWJMya4VKGQO+SOAvoEEm/d/cVkmRmP5Z0sqSqAUDPPScddVTXx2bOlD7/eWnTJmnGjJ7POfvs4r/166XTT++5/XOfk844Q3rpJemss3puP+886aSTir/7M5/puf2SS6TjjpMWLpRmz+65/YorpA9+UPrNb6SLLuq5/brrpGnTpEcekS67rOf2G26Q9t9fuu8+6Zprem6/+WZp/Hjp9tul66/vuf0nP5FGj5Zuuqn4r7t586Qdd5S++13pjjt6bv/Vr4r/X321dP/9XTY1DR2q6Q/M0wGzjtDgKy7X6Pm/1pD5O2j74XfUKOmuu4q3L7xQeuKJrq+9557SLbcUb8+eXXwPO9tvP2nu3OLtlhZp+fKu26dNU9N11xV7Cv/wWWn16q7bDztM+ta3irdPO03asKHr9mOPlb761eLt44+X3nyz6/YTT5TOP794u/vfncTfXsC/PQ0dKj34YPH2pZdKjz7adXsD/vZ03XXF2x//ePr+9kpCjgHsIemlTvdXlx7rwsxazKzVzFq3bNnSsMahPuVUzfh37aihgzsd/JErLunNLdvU/uYWvbllm9KzuijfQqaATpc03d3/tnT/LEkfcPdzqj2HFBCQPElIBaK2aimgkD2ANZI6Fw7es/QYMoCSCvlBtdD0CjkG8JSkfc1sbxUP/H8p6a8CtieVknihDs4I84VqoekVLAC4+1YzO0fSz1WcBvp9d18Wqj1plNQDLSUVopfEQF/
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x=y_test,y=test_res)\n",
"plt.axhline(y=0, color='r', linestyle='--')"
]
},
{
"cell_type": "code",
"execution_count": 234,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"60"
]
},
"execution_count": 234,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(test_res)"
]
},
{
"cell_type": "code",
"execution_count": 235,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<seaborn.axisgrid.FacetGrid at 0x2160370e708>"
]
},
"execution_count": 235,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAWAAAAFgCAYAAACFYaNMAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAnPklEQVR4nO3deXxU9b3/8dcn+0IgGwkQSMKObAJGRVDrUi3ibuvW1koX8Haxi11u7XLrvbW3tfZabeuvFddrtS6t2roLKgpaFllkD4Q9AbJBVhJClu/vj8ReqhQmkJPvTPJ+Ph7zkEzOnPPOZObtyXfO+R5zziEiIt0vyncAEZHeSgUsIuKJClhExBMVsIiIJypgERFPYnwHONyMGTPcq6++6juGiEhXsyPdGVZ7wJWVlb4jiIh0m7AqYBGR3kQFLCLiiQpYRMQTFbCIiCcqYBERT1TAIiKeqIBFRDxRAYuIeKICFhHxRAUsIuJJoAVsZqlm9hczKzSzjWZ2RpDbExGJJEFPxnMP8Kpz7lNmFgckBbw9EZGIEVgBm1k/4GxgFoBz7hBwKKjtiYhEmiCHIIYCFcDDZrbKzB4ws+QPL2Rmc8xsuZktr6ioCDCOyLENyc3DzEK6DcnN8x1XIpwFdVVkMysAlgDTnXNLzeweoNY59+N/9ZiCggK3fPnyQPKIhMLMuGveppCWveXC0eiq4hKibp8PuAQocc4t7fj6L8CUALcnIhJRAitg51wpUGxmozvuOh/YENT2REQiTdBHQdwMPN5xBMQ24PMBb09EJGIEWsDOufeBgiC3ISISqXQmnIiIJypgERFPVMAiIp6ogEVEPFEBi4h4ogIWEfFEBSwi4okKWETEExWwiIgnKmAREU9UwCIinqiARUQ8UQGLiHiiAhYR8UQFLCLiiQpYRMQTFbCIiCcqYBERT1TAIiKeqIBFRDxRAYuIeKICFhHxRAUsIuKJClhExBMVsIiIJypgERFPVMAiIp6ogEVEPFEBi4h4ogIWEfFEBSwi4okKWETEExWwiIgnKmAREU9UwCIinqiARUQ8UQGLiHiiAhYR8UQFLCLiiQpYRMSTmCBXbmY7gDqgFWhxzhUEuT0RkUgSaAF3ONc5V9kN2xERiSgaghAR8SToAnbAPDNbYWZzjrSAmc0xs+VmtryioiLgOBKuhuTmYWYh3Ybk5vmOG7E68zx39rnW77Dzgh6CONM5t9vMsoD5ZlbonFt4+ALOubnAXICCggIXcB4JUyXFu7hr3qaQlr3lwtEBp+m5OvM8Q+eea/0OOy/QPWDn3O6O/5YDzwGnBbk9EZFIElgBm1mymaV88G/gQmBdUNsTEYk0QQ5BZAPPmdkH2/mTc+7VALcnIhJRAitg59w24OSg1i8iEul0GJqIiCcqYBERT1TAIiKeqIBFRDxRAYuIeKICFhHxRAUsIuKJClhExBMVsIiIJypgERFPVMAiIp6ogEVEPFEBi4h4ogIWEfFEBSwi4okKWETEExWwiIgnKmAREU9UwCIinqiARUQ8UQGLiHiiAhYR8UQFLCLiiQpYRMQTFbCIiCcqYBERT1TAIiKeqIBFRDxRAYuIeKICFhHxRAUsIuKJClhExBMVsIiIJypgERFPVMAiIp6ogEVEPFEBi4h4ogIWEfFEBSwi4kngBWxm0Wa2ysxeDHpbIiKRpDv2gL8BbOyG7YiIRJRAC9jMBgMXAw8EuR0RkUgU9B7w3cD3gLaAtyMiEnECK2AzuwQod86tOMZyc8xsuZktr6ioCCqOSNezKMwspFtMbFzIy3Z2+SG5eb6fCTlOMQGuezpwmZnNBBKAvmb2mHPus4cv5JybC8wFKCgocAHmEelaro275m0KadFbLhwd8rKdXf6WC0eHvF4JL4HtATvnbnXODXbO5QPXAW9+uHxFRHozHQcsIuJJkEMQ/+Ccewt4qzu2JSISKbQHLCLiiQpYRMQTFbCIiCcqYBERT1TAIiKeqIBFRDxRAYuIeKICFhHxRAUsIuKJClhExBMVsIiIJypgERFPVMAiEcA5h3OaLrun6ZbZ0EQkNAeaWthT00i/sz7L86v3UNvYTGNzKwebW2lzEB1lxEQZyfEx9E2IITUxjuSx51BS1UBOaiJm5vtHkE5QAYt4VtVwiE2ldWyvPEB5XRMA/aZeTW1jM6lJsQyMTSAhNpooM1qdo6W1jfqmFmoam9ld3Ujmpd/hzDsWkJOayCfGDeDiiQOYPCSNqCiVcbhTAYt40Nrm2FJez7rdNZRUNwIwsF8CZwzLIDc9iTuvP5Vvvrz2mOtpc45bb7iI+597k0VFlTy2ZCcPvbudwWmJ3DA1j+tOzaVfUmzQP44cJxWwSDdqa3MUltaxbMd+ahqb6ZsQw7ThGYwd2Jfk+P97O7qWQyGtL8qM5vLt3Dgtnxun5VN7sJnXN5Tx9PJifv5KIXe/XsQ1BYP56rkjyOqbENSPJcdJBSzSTbZXHmDh5gqqG5vJSonn0okDGZqZ3KXjtn0TYrlqymCumjKYDXtqefjd7Ty+dBdPLS/m89OHEhWf3GXbkhOnAhYJWE1jM/2v+hHPr95DWlIsl0wcyLAuLt4jGTuoL3defTJfPXcEv359M394eyuDZt9H4d5aRg9I0Qd2YUCHoYkExDnHmpJqHluyk4S8kzlzRCafOT2P4f37dGv55Wcmc891k3nha2fSUlPKaxvKeHbVbqobQhvmkOCogEUCcKCphb+t3sOCTRUMSk1kzwNf5pS8NKI9HpkwPqcfpY99j3NH96e8rok/LdvFhj21Or7YIw1BiHSx4v0NvLKulEOtbZwzqj8TB/djUV2l71jtXBsTB6eSn5nMvPVlzN9Yxo59Bzh/TBbxsdG+0/U6KmCRLuKcY+Wuat7dUklqUiyfnJJDRp9437GOqP3DuhxW7KxiybZ9VNQ1cenJg0hPjvMdrVfREIRIF2hpbePVdaW8s6WS4Vl9uO7U3LAt3w9EmXFqfjpXTR5MU0sbT71XzNaKet+xepWQCtjMpodyn0hv1NjcynOrdrO5vJ7pwzOYOX4AcTGRs2+Tk5bI9acNIS05lhfX7GXlziqNC3eTUF8lvw3xPpFepaaxmaeXF1NW18RF4wdQkJ8ekYd3pSTE8qkpgxmZ1YdFWypZWFSpEu4GRx0DNrMzgGlAfzO75bBv9QU0Yi+9WtzAUTz1XjHOOa6cnENOaqLvSCckJjqKi8YPILmokveLqznQ1AJR+pgoSMd6duOAPh3LpRx2fy3wqaBCiYS74v0NZF/338RGG1dMGkxaD/nwysz42Kj+pMTHsGhLJf2v/AFNLa3Ex2h/KwhHLWDn3NvA22b2iHNuZzdlEglrO/Yd4MU1e2mpKeOa8076pzkceoopeWnERBsLOI05j67gvhtOIUGHqXW5UMeA481srpnNM7M3P7gFmkwkDG2tqOfF1XtJT4qj7Ilbe2T5fmDi4FT2vXIPC4sqmP3ocg42t/qO1OOEWsB/BlYBPwK+e9hNpNcoKqvj5bV7yUyJ46opObQ11vqOFLj6NfP55Scnsqiokq8/sYqW1jbfkXqUUP/33eKc+32gSUTCWFFZHa+sK2VgvwQumzSoV42JXl0whIZDrfzk+fV89y9r+J+rT9Zk710k1AJ+wcy+AjwHNH1wp3NufyCpRMLI1op6Xl3fXr6XT8qJqGN8u8qN0/Kpb2rhztc20TchhtsuGxeRh9uFm1AL+MaO/x4+7OCAYV0bRyS87Nx3gFfWltI/JZ7LJg3qleX7ga+cM5zqhkPcv2g7g9OSmH223v4nKqQCds4NDTqISLgp3t/AC2v2kp4cxxWTcnrVsMORmBm3XnQSe6oP8rOXN5KTlsjMCQN9x4poIRWwmX3uSPc75x7t2jgi4WFvTSMvrNlDamIsV07O0SFYHaKijP+55mRKaw/yzafeJ7tvAqfkpfmOFbFC/Xvq1MNuZwG3AZcFlEnEq8r6Jv72/h6S42K4cnIOiXEq38MlxEZz/+cKGNgvgZv+uIK9NY2+I0WskArYOXfzYbfZwBTaz5AT6VGi+2bx1/d3ExNtXDk5p0cf53si0pPjeOBzBTQeauGmP67QMcL
"text/plain": [
"<Figure size 360x360 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.displot(test_res,bins=25,kde=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Still unsure if normality is a reasonable approximation? We can check against the [normal probability plot.](https://en.wikipedia.org/wiki/Normal_probability_plot)"
]
},
{
"cell_type": "code",
"execution_count": 236,
"metadata": {},
"outputs": [],
"source": [
"import scipy as sp"
]
},
{
"cell_type": "code",
"execution_count": 237,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAhYAAAKxCAYAAAAPXuWFAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAA9hAAAPYQGoP6dpAABi5ElEQVR4nO3dd3xUVfrH8c9DaFISxIJIUGxr3bWva4nCoq7uz100NBVRbKxdVOwNK3bB3lDAQs/a1u6K4urau64VFWJEAU1Q+uT8/jgzMgkpM5M7c6d836/XvHLn3jN3ngwlT055jjnnEBEREQlCq7ADEBERkfyhxEJEREQCo8RCREREAqPEQkRERAKjxEJEREQCo8RCREREAqPEQkRERAKjxEJEREQCo8RCREREAqPEQkQwM2dmtwR4v2HRe+6UQNuZZjYz7nmv6GuHxZ0bZWZZUya4oRhFxFNiIZKl4n44xx5LzewzM7vFzLqFHV/YzOw8Mzsw4Hv2rveZrzCzr8xsopltHNB77BZNlLoEcT+RbKPEQiT7XQQMBU4CXgGOB141sw6hRhWcfaOPplwOrFHv3HnAgekICLgJ/5kPB/4FDAbeMLP1A7j3bsDFQJcA7iWSdVqHHYCINOtJ59yb0eN7zGwBcDrQD5jU0AvMrKNz7tdMBdgSzrnlCbRZCazMQDgxs5xz06PH95nZZ/hk4whgdAbjEMk56rEQyT3/jn7dCMDMxpvZL2a2iZk9YWaLgAej1zqa2fVmNsfMlpnZp2Y20sysoRub2ZBom6Vm9paZ7Vnv+oZmdlu0zRIzW2Bm08ysVyOxdjCzO6PtaqJDCmvWu2edORaNxFVnjkX0uCNwRNywxXgz6xM9PqiBexwavbZrU+/ViDqfeRNx/tnMZpnZr2b2s5k9YmZbxn8fwLXRp7PjYu+VQkwiWUk9FiK5Z5Po1wVx51oDTwMvAyOBxdHk4VGgDzAOeBf4C/4HWw/gtHr33Qvf5X8TsAw4AXjKzP7onPsw2mZnfFf+ZGAu0As/NDPTzLZyzi2ud89bgJ+BUcDm0bYbmllv51xLJmMOBe4BXgfuip77EvgvMAcYAvyz3muGAF86515N4f0a+szrMLO9gSeBr/Df7xrAycB/zGwH59zXQAXwO+AQ/Oc/P/ryH1OISSQ7Oef00EOPLHwAwwAH9AXWBkrxP/jnA4uBHtF246PtRtd7fb/o+fPrnZ8G1AKbxJ1z0ceOcec2AJYAFXHn1mggzj9FXzu0gdjfBNrEnT8zev7vcedmAjPjnveKthkWd26U/++qzvv+AoxvIJ4rgaVASdy5dYAVwKhmPvPe0fc+MvqZdwf+CsyOfmY7NRHjO8A8oGvcuT8AEWBC3LmR0df2CvvvmB56pOOhoRCR7Pcc/jfaOfiegl+Ag5xzlfXa3V7v+V/xP9Ruqnf+esCA/eudf9U591bsiXPuW+AR4C9mVhQ9tyR23czamNlawBf4XokdGoj9LufcinoxrozGli4TgXbAgLhzg/G9Og8keI978Z/5d/jJmx2BI9yquS51mFl3YDt8orMwdt459z7wLOn9fkWyioZCRLLficBn+B/I84BPnXO19dqsxA9NxNsQ+M45t6je+U/irsf7vIH3/gzogP+N/3szWwM4F/8bfQ98ghJT0sDr69zTOfeLmVXhf+NPC+fc/8zsDfzQx7jo6SHAf51zXyR4m0uBWfjEbD7wifMTSBsT+yw/beDaJ/jkLGcm1Iq0hBILkez3emO/KcdZ1kCykQ4345OKMcCrQDW+W38y2TUZfCIw1sxK8b0Xf8Iv103UB86559ISmUieU2Ihkr++AfY2s871ei22iLseb7MG7vE7/HyO2OTCAfj5AmfEGphZexqvybAZ8EJc2074eQtPJPg9NKWpyZ+TgRvwkyTXwM+vmBLAezYm9llu3sC1LYD5cb0VWVNBVCQdsuk3DBEJ1hNAEav/pn4a/ofbk/XO72pmv82TMLOe+AmgzzjnItHTEeoOf4Bf+VDUSAzDzaxN3PPj8b/Q1H/vVPxKIwmNc25+9D0Oww+DPBU9lxbOuSr8qpsj4itqmtk2+OJf8YlULMHogkgeUo+FSP56DN9bcEW0TsJ7+B9y/YAxzrkv67X/EHjazOKXm4KvEhnzODDUzKqBj4Fdgb1pfBlmW+B5M5uK/23+BPyS2Edb9q0B8Ba+R+Z0/CTL2c651+KuTwRiRa4uDOD9mnMmPpl51czGsWq5aTV+VUtMbILsFWY2Gd+b8pjmX0i+UGIhkqecc7Vm9nf8RMTB+LkRX+N/AF7fwEtexM+buBi/1PRj/HLK9+PanIrvtRgCtAf+g08snm4kjJOibS8F2uArhZ7inAtiOOB0fA2LWLnvCUB8YvEY8BO+ZzaIRKZJzrnnzGw/4BL897sC/5me7ZybHdfuDTO7EDgO2C8a30as6skQyWkWzL9vEZHsYmat8T0Zjznnjg47HpFCoTkWIpKvDsQvk50YchwiBUU9FiKSV8xsF3zFywvxqzEaKtwlImmiHgsRyTfH4yt8/gAcHnIsIgVHPRYiIiISGPVYiIiISGCUWIiIiEhgCqqOhZkZsD5Qf1MmERERaV5n/OaGjc6jKKjEAp9U1N8BUkRERBJXClQ2drHQEotFAHPmzKG4uDjsWERERHJGTU0NPXv2hGZ6/QstsQCguLhYiYWIiEgaaPKmiIiIBEaJhYiIiARGiYWIiIgERomFiIiIBEaJhYiIiARGiYWIiIgERomFiIiIBEaJhYiIiARGiYWIiIgERomFiIiIBEaJhYiIiARGiYWIiIgERomFiIiIBEaJhYiIiARGiYWIiIgERomFiIiIBEaJhYiIiARGiYWIiIgERomFiIiIBEaJhYiIiARGiYWIiIgERomFiIiIBEaJhYiISL754YfQ3lqJhYiISD555x3YaCO44gqorc342yuxEBERyRc//QT9+8PixfDf/4YSghILERGRfFBbC4cfDrNn+x6LiROhVeZ/zCuxEBERyQdXXQWPPw7t2sGMGbDmmqGEocRCREQk1z33HFx4oT++7TbYfvvQQlFiISIiksvmzoVDDvFDIUcfDUcdFWo4SixERERy1fLlMHAgzJ/veyluvjnsiJRYiIiI5KyRI/3qjy5dYPp0WGONsCNSYiEiIpKTJk1a1UNx//2w8cbhxhOlxEJERCTXfPQRHHOMPz7/fDjggHDjiaPEQkREJJcsWrSqCFbfvnDJJWFHVIcSCxERkVzhnF/58emn0KMHPPQQFBWFHVUdSixERERyxdixMG0atG7tv667btgRrUaJhYiISC74z3/gzDP98Q03wK67hhtPI5RYiIiIZLt582DQIFi5Eg4+GE46KeyIGqXEQkREJJutXOkra373HWy5Jdx9N5iFHVWjlFiIiIhkswsvhBdegE6doKLCf81iSixERESy1SOP+F1LAcaNgy22CDeeBCixEBERyUZffAFHHOGPR4zwcyxygBILERGRbLN4sS+CVV0Nu+8O11wTdkQJU2IhIiKSTZyDE06A99/3dSqmTIE2bcKOKmE5k1iY2fFm9r6Z1UQfr5rZ/mHHJSIiEqh77oEJE6BVK5g82VfYzCE5k1gAc4FzgB2BnYB/A4+Y2dahRiUiIhKUN99cVaPiyiuhT59w40mBOefCjiFlZrYQONM5Ny7B9sVAdXV1NcXFxekNTkREJBkLF8IOO8A330C/fvDPf2ZVvYqamhpKSkoASpxzNY21a525kIJjZkXAQKAj8GoT7doB7eJOdU5zaCIiIsmrrYXDDvNJxSabwPjxWZVUJCOXhkIws9+b2S/AMuAO4CDn3MdNvORcoDruMTf9UYqIiCTpiivgySehfXuYMQO6dAk7opTlVGIBfApsB+wC3A5MMLOtmmg/GiiJe5SmO0AREZGkPPMMXHyxP77jDth223DjaaGcGgpxzi0Hvog+fcvMdgZOBf7RSPtl+N4NACxHu5VERCRPffstHHqoX2I6fPiqglg5LNd6LOprRd05FCIiIrlh2TIYOBAWLIAdd4SxY8OOKBA502NhZqOBJ4Fv8ZMwDwV6A38JMSwREZHUnH4
"text/plain": [
"<Figure size 600x800 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Create a figure and axis to plot on\n",
"fig, ax = plt.subplots(figsize=(6,8),dpi=100)\n",
"# probplot returns the raw values if needed\n",
"# we just want to see the plot, so we assign these values to _\n",
"_ = sp.stats.probplot(test_res,plot=ax)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"-----------\n",
"\n",
"## Retraining Model on Full Data\n",
"\n",
"If we're satisfied with the performance on the test data, before deploying our model to the real world, we should retrain on all our data. (If we were not satisfied, we could update parameters or choose another model, something we'll discuss later on)."
]
},
{
"cell_type": "code",
"execution_count": 241,
"metadata": {},
"outputs": [],
"source": [
"final_model = LinearRegression()"
]
},
{
"cell_type": "code",
"execution_count": 242,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"LinearRegression()"
]
},
"execution_count": 242,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"final_model.fit(X,y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note how it may not really make sense to recalulate RMSE metrics here, since the model has already seen all the data, its not a fair judgement of performance to calculate RMSE on data its already seen, thus the purpose of the previous examination of test performance."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Deployment, Predictions, and Model Attributes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Final Model Fit\n",
"\n",
"Note, we can only do this since we only have 3 features, for any more it becomes unreasonable."
]
},
{
"cell_type": "code",
"execution_count": 243,
"metadata": {},
"outputs": [],
"source": [
"y_hat = final_model.predict(X)"
]
},
{
"cell_type": "code",
"execution_count": 244,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABHgAAAGoCAYAAAA99FLLAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAADRrklEQVR4nOz9e5xU1Znvj39WX4DqaNPKRehGLpmcQ47EMxLxl5zTaMQxA0ElHZOZJNMC4qATE04wCQ2NJIomLQ2YjGRINMbxgvZkzCSmB5WBXMAo/ZtkwGDG4AlnEhGlGyKgbZt0I33Z3z927e6qXWvt+7Xq8369eGnv2rX3qqq1PuvZz3rW8whN00AIIYQQQgghhBBC0ktZ3A0ghBBCCCGEEEIIIf6gg4cQQgghhBBCCCEk5dDBQwghhBBCCCGEEJJy6OAhhBBCCCGEEEIISTl08BBCCCGEEEIIIYSkHDp4CCGEEEIIIYQQQlIOHTyEpAQhhCaEeE/c7SCEkDQihJie1dGK7N//JoRYGne7/CCEeEUIcWXc7SCEEJJ8+CxRGtDBQ4oKIcQfc/4NCSH6cv5uzBrDwvSeCiHE60KIqyXXGyWE+LoQ4mj2Gq8IIe6J7AMRQkgJk9VcQ8ePCyEeFkKcFcS1NU37iKZpj3hs161CiMPZdh0VQjweRJsIISSNZLX6dSHEu3KOLRdCPBNjsxIBnyVI1NDBQ4oKTdPOMv4BeBXANTl//xBADYAPmd62AIAGYKfkkmsBzAHw/wNwNoDLAfwqnNYTQgiRcE1Wwy8CMBu6LsdGNupnMYArs+2aA+BncbaJEEISQDmAlXE3Ik6MCFETfJYgkUIHDykZNE07DeD7AJaYXloC4J80TRuQvO0SAD/SNK1L03lF07RtxotZL/xaIcRLQog3hRAPCSHG5Lx+tRDiBSFEtxDi/y+E+J+m964SQvynEOItIcTjpvc2CSGOCSG6hBA3BPU9EEJIGtE07TiAXdAdPQAAIUSzEOL3Qoi3szr8sZzXyoUQdwshTgohXgZwVe71hBDPCCGWZ/+/TAjxZSHEkewq9DYhxFhFUy4BsEvTtN8b7dI07X7TdTcIIf5DCNEjhPhXIcS5Oa9/MDsfdAshfi2EuNz03q8KITqyn+nHQojxOa8vzrbxlBBinZfvkRBCQmIzgFVCiBrZi0KI9wohfiKEeEMIcUgI8dfZ4zOyeliW/fu7QojXc973qBDiluz/Xy+EeDmrj4eFEI05xzuEEFuzNvVvhRB/kXONZUKI/5t938tCiL/Lee3ybHTNrdn54hXjutnXR2fnkleFEH8QQtwnhMiY3rtGCHEcwEOSj85nCRIpdPCQUuMRAJ/IEeaxAK7JHpfxCwBfFEJ8VghxoRD527uyNAKYD+DPAPx3AF/OXns2gAcB/B2AcQC+A2C7EGJ0znv/GnoE0QwA/xPA9dn3LgCwCsCHAfw3AMyxQAgpaYQQUwB8BMDvcg7/HsClAMYCuAPAY0KIydnXbgRwNfSonzkAPmFx+euz/+YBeDeAswBsVZz7CwBLsobzHCFEueScJQBuADAZwACAb2Y/Qx2ApwF8DcC50HX+h0KICTnv/RsAywBMBDAqew6EEBcAuBd69FAt9HllisVnIoSQKNkP4BlkNSsXoW/d+gmAf4KubZ8C8G0hxAWaph0G0ANdqwHgMgB/FEL8j+zfHwLw8+w1vgngI5qmnQ3gfwN4Iec2H4A+J4wHcDuAJ3Kc669Dnw+qoevr3wsh3p/z3knZ99UBWArgfiHEzOxrrdDt+4sAvCd7zm2m954LYBqAmyTfC58lSKTQwUNKCk3TOgD8AYCxyvvXAP6fpmkvKN6yAcBG6MK7H0CnKEzKuVXTtNc0TXsDQAuAT2eP3wTgO5qm/VLTtMFsrod3AHww573fzHr03wDwJEZWpv8awEOapv1G07Q/AVjv6QMTQkj6aRdCvA3gNehG+u3GC5qm/UtWQ4c0TXscwH9BD4MHdB29J0efN1jcoxHANzRNe1nTtD9CD6n/lJCE22ua9hiA/wPdGP85gNeFEGtMpz2ao99fAfDXWUfQdQB2aJq2I9vmn0CfWxbmvPchTdP+n6ZpfdCjTi/KHv8EgKc0TXtW07R3stcdsvhMhBASNbcB+D8mpzWgO1de0TTtIU3TBjRNOwA9dcJfZV//OYAPCSEmZf/+QfbvGdCdMr/OHh8C8D4hREbTtGOaph3Mucfr0DW/PzsfHEI2clPTtKc1Tft9NoLm5wB+DH1xIJevaJr2Tvb1p6HrtoBuz39B07Q3NE17G8Bd0B1UBkMAbs++t0/ynfBZgkQKHTykFNmGkW1ai7N/S8mK6bc0TauHnr+nBcCDOasKgP7QYXAE+soqoHvyv5QNqewWQnQDOD/ndQA4nvP/vdBXjZE9x3xdQggpRRqyq7WXA3gv9FVWAIAQYklO6Ho3gPflvO5GR2tNrx8BUAHgPNnJmqa1aZp2JfR54TMAviqEmJ9zivm+ldl2TQPwV6Z5YS70SB8DR/NC1mA/ZfGZCCEkUjRN+w2ApwA0m16aBuADJu1rhB79AugOnsuhR+88Cz0S6EPZf89lHeJ/AvBJ6Jp7TAjxtBDivTn36NQ0Tcv5e9gmF0J8RAjxC6FvD+uG7lQfn3Pum9nrm987AUAVgOdz2r0ze9zgRDYNhOo74bMEiRQ6eEgp8iiAvxBC/C/oHvA2J2/SNK1P07RvAXgTwAU5L52f8/9TAXRl//81AC2aptXk/KvSNO17Dm53THJdQggpWbKrqg8DuBsAhBDTAHwXwAoA4zRNqwHwGwBG+LsbHe2CbkjnnjsAPeLTqk39mqb9C4D/hO5cMjDftx/ASejzwqOmeeFdmqa1Wt0nS97nEUJUQQ/ZJ4SQJHE79C2ydTnHXgPwc5P2naVp2s3Z138OPaLm8uz/7wVQj+z2LOMimqbt0jTtw9Cd4r+FPgcY1Jm2P00F0JXdzvRD6HPHedm5YgdG5goAOEfkVADDiD1/EkAfgFk57R6r6Qn2h5vl9IvhswSJAjp4SMmhador0CeO7wH4iaYn7pQihLglm0AtI/Ry6kuhZ8A/kHPa54QQU7L7fNcBMMrlfhfAZ4QQHxA67xJCXCWEONtBM78P4HohxAVZI/52uzcQQkgJcA+ADwsh/hzAu6Ab1icAPYkm8p0s3wfw+aw+n4PCFeVcvgfgC0JP9nkW9BD8xzVJ8n2hJ/O8SghxttCTM38EwCwAv8w57boc/b4TwA80TRsE8BiAa4QQ84WeBHpMdo5xkkvnBwCuFkLMFUKMyl6XdhwhJFFomvY76Lbw53MOPwXgvws9UXxl9t8lRhSLpmn/Bd2Rch10R1APdAf7x5F18AghzhNCfDTriHkHwB+Rv011InTNrxRC/BWA/wHdkTMKwGjoc8VAVrP/UtL0O4Re0vxS6FvK/kXTtCHo9vzfCyEmZttRZ4rYtITPEiRqaBiQUuUR6Ku1yu1ZWXoBfB16+ONJAJ8D8HFN017OOeefoO/lfRl6crevAYCmafuhr2Bshe6p/x2yic/s0DTt36A/yOzOvm+3k/cRQkgxo2naCei6fZumaS9B1+d/h/4gcCGAjpzTvwu96tavoZekfcLi0g9Cj+58FsBhAKeh59mR0QPgVgCvAugGsAnAzZqm7c0551Ho0UbHAYxB9kFH07TXAHw0+/4T0Fdnm+DAHsvmmvgc9DnnGPR55ajd+wghJAbuhO6EBwBkc9f8JfTcNV3QtXEjdMeLwc8BnMrqpPG3wEhJ8TIAX8y+/w3o0T0357z/l9CTCZ+Evg3qE5qmncre+/PQHR5vQk9kv93U3uPZ17qgR/Z/RtO032ZfWwPdFv+FEKIHwE8BzIRz+CxBIkXkb1UkhLhBCPEKgOWapv007rYQQgiJHyHEMwAe0zTtgbjbQgghpYAQ4nro9vhcD++9HLpmx1KVkM8SJGgYwUMIIYQQQgghhBCScujgIYQQQgghhBBCCEk53KJFCCGEEEIIIYQQknIYwUMIIYQQQgghhBCSciriboATxo8fr02fPj3uZhBCSOp4/vnnT2qaNiHudjiFek8IId5Im94D1Hx
"text/plain": [
"<Figure size 1152x432 with 3 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"fig,axes = plt.subplots(nrows=1,ncols=3,figsize=(16,6))\n",
"\n",
"axes[0].plot(df['TV'],df['sales'],'o')\n",
"axes[0].plot(df['TV'],y_hat,'o',color='red')\n",
"axes[0].set_ylabel(\"Sales\")\n",
"axes[0].set_title(\"TV Spend\")\n",
"\n",
"axes[1].plot(df['radio'],df['sales'],'o')\n",
"axes[1].plot(df['radio'],y_hat,'o',color='red')\n",
"axes[1].set_title(\"Radio Spend\")\n",
"axes[1].set_ylabel(\"Sales\")\n",
"\n",
"axes[2].plot(df['newspaper'],df['sales'],'o')\n",
"axes[2].plot(df['radio'],y_hat,'o',color='red')\n",
"axes[2].set_title(\"Newspaper Spend\");\n",
"axes[2].set_ylabel(\"Sales\")\n",
"plt.tight_layout();"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Residuals\n",
"\n",
"Should be normally distributed as discussed in the video."
]
},
{
"cell_type": "code",
"execution_count": 247,
"metadata": {},
"outputs": [],
"source": [
"residuals = y_hat - y"
]
},
{
"cell_type": "code",
"execution_count": 248,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.lines.Line2D at 0x216039d7ac8>"
]
},
"execution_count": 248,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYAAAAEGCAYAAABsLkJ6AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAwBUlEQVR4nO3de3xU5Z0/8M93cmFyJwkhxMQQI0G5CjQqdA1V4lqKtCpaXPenrW672XZXg7UXe7Frt7bu8qt1LWt/tqjVqm3FLdW2XmgV7YpbpAblKiKIgNAQIGACCUNC5vn9MXOGuZwzc+Z6zsz5vF8vX4a5nefMmfN8n/sjSikQEZHzuKxOABERWYMBgIjIoRgAiIgcigGAiMihGACIiBwq3+oExGPMmDGqqanJ6mQQEWWV9evXH1ZK1YQ/nlUBoKmpCV1dXVYng4goq4jIHr3H2QRERORQDABERA7FAEBE5FAMAEREDsUAQETkUFk1CigRXq/C7t4B9PR7UFvuRlN1CVwusTpZRESWy+kA4PUqrNp6ALc9tQGeYS/cBS7cu3gG5k8ZxyBARI6X001Au3sHApk/AHiGvbjtqQ3Y3TtgccqIiKyX0wGgp98TyPw1nmEvDh7zWJQiIiL7yOkAUFvuhrsg9BTdBS6MLXNblCIiIvvI6QDQVF2CexfPCAQBrQ+gqbrE4pQREVkvpzuBXS7B/CnjcG5nGw4e82BsGUcBERFpcjoAAL4g0FxTiuaaUquTQkRkKzndBERERMYYAIiIHIoBgIjIoRgAiIgcigGAiMihGACIiBzK0gAgIl8Ska0iskVEfiUinKJLRJQhlgUAEakH0AmgVSk1FUAegL+zKj1ERE5jdRNQPoAiEckHUAzgrxanh4jIMSwLAEqp/QDuAbAXQDeAPqXUH8NfJyIdItIlIl2HDh3KdDKJiHKWlU1AlQCuAHAWgDMAlIjI9eGvU0otV0q1KqVaa2pqMp1MIqKcZWUT0KUA3ldKHVJKDQP4DYCPWpgeIiJHsTIA7AUwW0SKRUQAtAPYZmF6iIgcxco+gHUAfg3gTQCb/WlZblV6iIicxtLloJVSdwK408o0EBE5ldXDQImIyCIMAEREDsUAQETkUAwAREQOxQBARORQDABERA7FAEBE5FAMAEREDsUAQETkUAwAREQOxQBARORQDABERA7FAEBE5FAMAEREDsUAQETkUAwAREQOxQBARORQDABERA7FAEBE5FAMAEREDsUAQETkUAwAREQOxQBARORQDABERA7FAEBE5FCWBgARGS0ivxaRd0Rkm4jMsTI9REROkm/x8X8EYJVS6hoRKQRQbHF6iIgcw7IAICIVAOYCuBEAlFJDAIasSg8RkdNY2QR0FoBDAB4RkbdE5CERKQl/kYh0iEiXiHQdOnQo86kkIspRVgaAfACzADyglJoJYADA18NfpJRarpRqVUq11tTUZDqNREQ5y8oAsA/APqXUOv+/fw1fQCAiogywLAAopQ4A+EBEzvE/1A7gbavSQ0TkNFaPAroFwC/8I4B2AbjJ4vQQETmGpQFAKbUBQKuVaSAicirOBCYicigGACIih2IAICJyKKs7gSmDvF6F3b0D6On3oLbcjabqErhcYnWyiMgiDAAO4fUqrNp6ALc9tQGeYS/cBS7cu3gG5k8ZxyBA5FBsAnKI3b0DgcwfADzDXtz21Abs7h2wOGVEZBUGAIfo6fcEMn+NZ9iLg8c8FqWIiKzGAOAQteVuuAtCL7e7wIWxZW6LUkREVmMAcIim6hLcu3hGIAhofQBN1RELsBKRQ7AT2CFcLsH8KeNwbmcbDh7zYGwZRwEROR0DgIO4XILmmlI015RanRQisgE2ARERORQDABGRQzEAEBE5FAMAEZFDMQAQETkUAwARkUMxABARORQDABGRQzEAEBE5FAMAEZFDMQAQETkU1wKinBDvdpfcHpOIAYByQLzbXXJ7TCIfNgFR1ot3u0tuj0nkY3kAEJE8EXlLRJ61Oi25wOtV2HXoONa+dxi7Dh2H16usTlLaxbvdJbfHJPKxQxPQEgDbAJRbnZBs59SmDW27y+BMPdp2l/G+nihXWVoDEJEGAJcDeMjKdOQKpzZtxLvdJbfHJPKxugZwH4CvASgzeoGIdADoAIDGxsbMpCpLRWvayOVdwOLd7pLbYxL5WBYARGQhgINKqfUicrHR65RSywEsB4DW1tbcb9BOgpObNuLd7pLbYxJZ2wT0NwA+JSK7ATwJYJ6IPGFherIemzaIKB6ilPWFan8N4CtKqYXRXtfa2qq6uroykqZspU1wYtMGEWlEZL1SqjX8cav7ACjF2LTBWb5EZtkiACil/gTgTxYnwzaYgSXOqUNhiRJh+UQwCqVlYAuWrcF1D67DgmVrsGrrAUdM6EoFpw6FJUpE3AFARFwiwklbacIMLDmc5UtknqkAICK/FJFyESkBsAXA2yLy1fQmzZmYgSVHGwobzClDYYniZbYGMFkp1Q/gSgAvADgLwA3pSpSTMQNLDofCEplnthO4QEQK4AsA9yulhkWEjdJpoGVg4Z2YzMDM4SxfIvPMBoCfAtgNYCOAV0VkPID+dCXKybIhA7P7KCUOhSUyJ+GJYCKSr5Q6leL0RMWJYNbjMEui7GM0EcxsJ3CtiDwsIi/4/z0ZwGdTnEbKgGT3C+AoJaLcYbYJ6FEAjwD4lv/f7wJYAeDhNKSJ/FLd1JKK0nusFUft3jxERKeZDQBjlFJPicg3AEApdUpERtKYLsdLR1OLUen93M420+3l0VYcZfOQPgZFsiuzw0AHRKQagAIAEZkNoC9tqaK0NLWkYo5BtGGW2dI8lMltMzmzm+zMbA3gNgC/A3C2iPwvgBoA16QtVZSWzV3M7hcQrcQabZRSNmxIk6laivYdHjp2MulaF1G6mAoASqk3ReRjAM4BIAC2K6WG05oyhwvOrOsq3Fg0qwF5LqCoIB9er0ooszIzx8BMBmk0zDIbNqRJRTNYLMHf4efbmuMKimwuokyKGgBEZJHBUxNFBEqp36QhTYTTmfXSVdtwbWsjlr28A55hL5a/uivhEquZOQbJZJDZMIktE7WU8O/QbFBkHwplWqwawCejPKcAMACkiZZZ149249rlr6esxBprklQyGWQ2TGLLRC0l+DtcuX4fOue1BAJ4tKCYidoJUbCoAUApdVOmEkKRXC7B4NBIRtvVk80g7T4LNxO1lODvsLvPg8df34OOuc2YeeZojK8uMQyK2dCHQrnF9IYwInI5gCkAAjmBUuq76UgUnZbpdvVsaMZJRiZqKeHf4dHBIZw7rhwfmzg26nGyoQ+FcouppSBE5CcAigFcAuAh+EYA/UUp9bn0Ji+UE5eCsKJdmPsKJy+R75B9AJQuRktBmA0Am5RS04P+XwrgBaVUWzoSa8SJAQBghhxNro2a4bWmdEh2U3htptCgiJwB4AiAulQljqKze7u6VcyUmLMtQPBaUyaZDQC/F5HRAH4A4E34RgA9mK5EEZmhN2pm6aptqB/txuDQCMaWufF+73Hc/Mu3sq5JJdsCF2UnswHgHQAjSqmV/pVAZwF4Jm2pIjKhd+AkPndRM8SfL766/SDmT60LDJt1F7iwpL0FlcWF6O7zBIZVVt90AWrKRtk2U2VfAGWK2bWAvq2UOiYiFwGYB19H8APpSxZRdF6vwl8/9ODh13bh/pd34qE1u3DtBY1Y0bU3pEbwo9U7sGhWQ+B9nmEv1uw8bOs1ebJlTSXKfmYDgLby5+UAHlRKPQegMD1Jsq9MLiJG0e3uHcDtKzeFZJJ3Pfs2Fk6vD3mdZ9gbqCEAvmGVSulnqna5vqlYtI/IDLNNQPtF5KcA/hbAUhEZBfPBIyewWm4vRplkXtiv0l3ggnZ53AUudM5rweOv7wm8PngfA7tcX84HoEwxm4kvBvAHAB9XSn0IoArAV5M5sIicKSKviMjbIrJVRJYk83npxmq5vWiZZDB3gQuTxpVHLFW9aGY9HrmxFR1zm/H463vQ3ecJPK9lqna6vtGW3CZKJbOrgQ4iaN0fpVQ3gO4kj30KwJf9K42WAVgvIi8qpd5O8nPTgtP04xf
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x=y,y=residuals)\n",
"plt.axhline(y=0, color='r', linestyle='--')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Coefficients"
]
},
{
"cell_type": "code",
"execution_count": 249,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0.04576465, 0.18853002, -0.00103749])"
]
},
"execution_count": 249,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"final_model.coef_"
]
},
{
"cell_type": "code",
"execution_count": 250,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Coefficient</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>TV</th>\n",
" <td>0.045765</td>\n",
" </tr>\n",
" <tr>\n",
" <th>radio</th>\n",
" <td>0.188530</td>\n",
" </tr>\n",
" <tr>\n",
" <th>newspaper</th>\n",
" <td>-0.001037</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Coefficient\n",
"TV 0.045765\n",
"radio 0.188530\n",
"newspaper -0.001037"
]
},
"execution_count": 250,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"coeff_df = pd.DataFrame(final_model.coef_,X.columns,columns=['Coefficient'])\n",
"coeff_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Interpreting the coefficients:\n",
"\n",
"---\n",
"* Holding all other features fixed, a 1 unit (A thousand dollars) increase in TV Spend is associated with an increase in sales of 0.045 \"sales units\", in this case 1000s of units . \n",
"* This basically means that for every $1000 dollars spend on TV Ads, we could expect 45 more units sold.\n",
"----"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"---\n",
"* Holding all other features fixed, a 1 unit (A thousand dollars) increase in Radio Spend is associated with an increase in sales of 0.188 \"sales units\", in this case 1000s of units . \n",
"* This basically means that for every $1000 dollars spend on Radio Ads, we could expect 188 more units sold.\n",
"----\n",
"----"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Holding all other features fixed, a 1 unit (A thousand dollars) increase in Newspaper Spend is associated with a **decrease** in sales of 0.001 \"sales units\", in this case 1000s of units . \n",
"* This basically means that for every $1000 dollars spend on Newspaper Ads, we could actually expect to sell 1 less unit. Being so close to 0, this heavily implies that newspaper spend has no real effect on sales.\n",
"---\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note! In this case all our units were the same for each feature (1 unit = $1000 of ad spend). But in other datasets, units may not be the same, such as a housing dataset could try to predict a sale price with both a feature for number of bedrooms and a feature of total area like square footage. In this case it would make more sense to *normalize* the data, in order to clearly compare features and results. We will cover normalization later on.**"
]
},
{
"cell_type": "code",
"execution_count": 251,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>TV</th>\n",
" <th>radio</th>\n",
" <th>newspaper</th>\n",
" <th>sales</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>TV</th>\n",
" <td>1.000000</td>\n",
" <td>0.054809</td>\n",
" <td>0.056648</td>\n",
" <td>0.782224</td>\n",
" </tr>\n",
" <tr>\n",
" <th>radio</th>\n",
" <td>0.054809</td>\n",
" <td>1.000000</td>\n",
" <td>0.354104</td>\n",
" <td>0.576223</td>\n",
" </tr>\n",
" <tr>\n",
" <th>newspaper</th>\n",
" <td>0.056648</td>\n",
" <td>0.354104</td>\n",
" <td>1.000000</td>\n",
" <td>0.228299</td>\n",
" </tr>\n",
" <tr>\n",
" <th>sales</th>\n",
" <td>0.782224</td>\n",
" <td>0.576223</td>\n",
" <td>0.228299</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" TV radio newspaper sales\n",
"TV 1.000000 0.054809 0.056648 0.782224\n",
"radio 0.054809 1.000000 0.354104 0.576223\n",
"newspaper 0.056648 0.354104 1.000000 0.228299\n",
"sales 0.782224 0.576223 0.228299 1.000000"
]
},
"execution_count": 251,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.corr()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prediction on New Data\n",
"\n",
"Recall , X_test data set looks *exactly* the same as brand new data, so we simply need to call .predict() just as before to predict sales for a new advertising campaign.\n",
"\n",
"**Our next ad campaign will have a total spend of 149k on TV, 22k on Radio, and 12k on Newspaper Ads, how many units could we expect to sell as a result of this?**"
]
},
{
"cell_type": "code",
"execution_count": 252,
"metadata": {},
"outputs": [],
"source": [
"campaign = [[149,22,12]]"
]
},
{
"cell_type": "code",
"execution_count": 253,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([13.893032])"
]
},
"execution_count": 253,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"final_model.predict(campaign)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**How accurate is this prediction? No real way to know! We only know truly know our model's performance on the test data, that is why we had to be satisfied by it first, before training our full model**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"-----\n",
"\n",
"## Model Persistence (Saving and Loading a Model)"
]
},
{
"cell_type": "code",
"execution_count": 254,
"metadata": {},
"outputs": [],
"source": [
"from joblib import dump, load"
]
},
{
"cell_type": "code",
"execution_count": 255,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['sales_model.joblib']"
]
},
"execution_count": 255,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dump(final_model, 'sales_model.joblib') "
]
},
{
"cell_type": "code",
"execution_count": 256,
"metadata": {},
"outputs": [],
"source": [
"loaded_model = load('sales_model.joblib')"
]
},
{
"cell_type": "code",
"execution_count": 257,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([13.893032])"
]
},
"execution_count": 257,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"loaded_model.predict(campaign)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Up next...\n",
"### Is this the best possible performance? Its a simple model still, let's expand on the linear regresion model by taking a further look a regularization!\n",
"\n",
"-------\n",
"--------"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 1
}