You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

777 lines
121 KiB

2 years ago
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"___\n",
"\n",
"<a href='http://www.pieriandata.com'><img src='../Pierian_Data_Logo.png'/></a>\n",
"___\n",
"<center><em>Copyright by Pierian Data Inc.</em></center>\n",
"<center><em>For more information, visit us at <a href='http://www.pieriandata.com'>www.pieriandata.com</a></em></center>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction to Simple Linear Regression\n",
"\n",
"In this very simple example, we'll explore how to create a very simple fit line, the classic case of y=mx+b. We'll go carefully through each step, so you can see what type of question a simple fit line can answer. Keep in mind, this case is very simplified and is not the approach we'll take later on, its just here to get you thinking about linear regression in perhaps the same way [Galton](https://en.wikipedia.org/wiki/Francis_Galton) did."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Imports"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Sample Data\n",
"\n",
"This sample data is from ISLR. It displays sales (in thousands of units) for a particular product as a function of advertising budgets (in thousands of dollars) for TV, radio, and newspaper media."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv(\"Advertising.csv\")"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>TV</th>\n",
" <th>radio</th>\n",
" <th>newspaper</th>\n",
" <th>sales</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>230.1</td>\n",
" <td>37.8</td>\n",
" <td>69.2</td>\n",
" <td>22.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>44.5</td>\n",
" <td>39.3</td>\n",
" <td>45.1</td>\n",
" <td>10.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>17.2</td>\n",
" <td>45.9</td>\n",
" <td>69.3</td>\n",
" <td>9.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>151.5</td>\n",
" <td>41.3</td>\n",
" <td>58.5</td>\n",
" <td>18.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>180.8</td>\n",
" <td>10.8</td>\n",
" <td>58.4</td>\n",
" <td>12.9</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" TV radio newspaper sales\n",
"0 230.1 37.8 69.2 22.1\n",
"1 44.5 39.3 45.1 10.4\n",
"2 17.2 45.9 69.3 9.3\n",
"3 151.5 41.3 58.5 18.5\n",
"4 180.8 10.8 58.4 12.9"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Is there a relationship between *total* advertising spend and *sales*?**"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"df['total_spend'] = df['TV'] + df['radio'] + df['newspaper']"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:xlabel='total_spend', ylabel='sales'>"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEHCAYAAACp9y31AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy86wFpkAAAACXBIWXMAAAsTAAALEwEAmpwYAAAw7UlEQVR4nO3de5hU9Zkn8O9b1VXd1Rfotu1GEAjGh3TCsERsrzDPrk7GSyZsiIGYiRE1GoFh3Mw4ieJMll1nWfcJ4sSJ4zhgEq+EjEZwdJmMyjoSJ0g0tERCMIQQJdykG+yGvlRfqurdP+pCddU5VaeqzqmqU/X9PA8P3dV1+fVR3vrV+3t/709UFUREVD08pR4AEREVFwM/EVGVYeAnIqoyDPxERFWGgZ+IqMrUlHoAVpx99tk6Y8aMUg+DiMhVurq6TqhqW+rtrgj8M2bMwM6dO0s9DCIiVxGRg0a3M9VDRFRlGPiJiKoMAz8RUZVh4CciqjIM/EREVcYVVT1ERLmKRBQnB0cxGgrDX+NFa4MfHo+UelhlgYGfiCpOJKLYd7wftz+1E4d7g5jaEsB3b7oIHZOaGPzBVA8RVaCTg6OJoA8Ah3uDuP2pnTg5OFrikZUHBn4iqjijoXAi6Mcd7g1iNBQu0YjKCwM/EVUcf40XU1sC426b2hKAv8ZbohGVFwZ+Iqo4rQ1+fPemixLBP57jb23wl3hk5YGLu0RUcTweQcekJjy/Yj6regww8BNRRfJ4BG1NtY6+hltLRhn4iYjy4OaSUeb4iYjy4OaSUQZ+IqI8uLlklIGfiCgPbi4ZZeAnIjIRiSh6+kdwpHcIPf0jiEQ08TM3l4xycZeIClJIZUu5VMUYjQNAxsVbN5eMMvATUd4KqWwpl6oYs3G0NvoNF2+fXzE/USZajJJRJzDVQ0R5K6SypVyqYszGMTzm3sXbbBwL/CIyTUReE5F3ReRXIvIXsdvvFZEjIvKL2J8/cWoMROSsQipbyqUqxmwcXhHXLt5m4+SMPwTg66r6CQCXAfhzEZkV+9mDqnpB7M+PHRwDETmokMqWcqmKMRtHwO917eJtNo4FflU9pqpvx77uB/AugHOdej2iapap+sRJhVS2lEtVjNk4mgP+xOLt9pVX4vkV812xK9cKUXX+fxARmQHgdQCzAfwVgFsAnAawE9FPBb0Gj1kKYCkATJ8+vfPgwYOOj5PIjUq9SFqpVT0VEeBFulT1orTbnQ78ItII4CcA7lPVzSIyCcAJAApgNYDJqnprpue46KKLdOfOnY6Ok8itevpHcN0j28flqae2BMZVn1B1Mgv8jpZziogPwCYAP1DVzQCgqseTfv5dAFucHANRpSuXRVI3qdQZvlWOBX4REQDfB/Cuqn476fbJqnos9u11APY4NQaiahBfnEyd8VdC9Uk2+QTwUqfGyoGTVT3zASwB8EcppZv3i8gvRWQ3gCsB3OngGIgqXrkskhZbPIBf98h2zF/zGq57ZDv2He/PurBdLvsHSsmxGb+q/hSA0dsnyzeJbOTm1gGFMAvg2dY2mBpjywaiimBn6wC35L/zDeBuSY05+d+BgZ+IErLlv8vpTSHfAB5PjaX15imj1JjT6xBFqeMvFMs5iYojU2loa4O/rBZFC20QVy5vYEbsKtEtSTknEblLpvRJvjl1pxSytlHuXTWdXodgd04iSsjUP6ccF0XjAfzclnq0NdWW1ay9EE73MWLgJ6IEo9LQ9Us60RLwlU1TtWrgdIkuc/xENE4oFMHRU0F094/g5OAoNnUdwp1XdWBmWyP29wyUTY6/0tmxDsEcPxFZ0hscww3fe3NcWmfvsf5Ed8pq2y9QKk6uQzDwE9E4mXL55b4oStYwx09E4zCXX/k44ydyMbvr0SMRhUKx4bZL8d6JQTz06n70DIyU3QYnKgwDP5FL2b27MxSKYF93P5Y93ZV4vvU3dmJycx2aA8zlVxKmeohcKt8uk0bHNEYiiqOngomgH3++ZRu6EI6AQb/CcMZP5FL5bKgy+5QwaUItuvtHSrJBq9B0Vbm3XyhHnPETuVQ+i7BmnxKCo9GWDMVe1M23p75dj69WDPxELpXP7k6zTwlhBTZ1HcKaRXPSdu06uahb6KEoPFQlP0z1ELlUPk3KzFoZ1/k8uPOqDjy4dR9WLZiF1gY/2ptqMXlCnaNplEL7/5Rj/yA3YOAncrFcN1Ql96Jva6zF1z41E+ed3QAAmNnWiPuum5MI8i0Bn+MtGgo9FMUth6qUG/bqIaoykYiiLziKY33DWLaha1xQn9nWiN7gGEZDYYgIrl+/o+Ce8NnGUkhJKg9Oz4y9eogIQPRTQjiCRNAHzuTGN3710kSfnueWX+54GqXQ84KNHt8S8LHKJwsGfqIqE4kogmMhw6CeXNIZr/JxOo1SaP+f5MfzE4A1rOohqiLxwHige9CwdDO5GmbdtgNpVT7l3rqBVT7WcMZPVEXigbGtsRZrFs3Byk27z7RnWNKJ7/y/3yTuu+tQH5584z08u+xyqKor0ias8rGGgZ+oisQD4+HeIB54OVq62RzwYWpLAJOa6nDnVR3Ye6w/8WZw51UdOGdCXVkH+2Ss8rGGVT1ENiqX9gFm4+jpH8F1j2w3rdSxc/zFvhaRiOLE4AiGRsJpnUWrNcfPqh4ih5XLwmKmcSTX8Sf/LJ63t+uglWJfC6PXY2dRc5zxE9kk22y6XMZRjJl4sa9FuVz7csMZP5HDymVhMds4inF8YrGvRblce7dgOSeRTcrlyMJCx2HUrz/T7U6MIVflcu3dgoGfyCb5dMss9ThSg3koFDFsc2x2u1nwL/a1KJdr7xaO5fhFZBqApwCcAyAC4FFV/Y6InAXgGQAzALwP4HpV7c30XMzxk1uUe1VP6n1SF0STWzbETW0J4Nlll+fct6cUVT3lcO3LSSly/CEAX1fVt0WkCUCXiGwFcAuAV1X1WyJyD4B7AKx0cBxERVOM/Hmh40gOkKm7XM1O4QqFIznn0It9Lcrl2ruBY6keVT2mqm/Hvu4H8C6AcwEsBPBk7G5PAvicU2MgovGST6yKb+RKZnYKV43X40gOPZd1A7JPUXL8IjIDwFwAbwKYpKrHgOibA4B2k8csFZGdIrKzp6enGMMkGqcSg1JyL5u+4FhaMN/UdQjrl3Sm5crbG2ttz6Hz2MTScbyOX0QaAfwEwH2qullE+lS1Oennvarakuk5mOOnYiuXzVh2O9I7hPlrXgMAzJ3WjG9c0zGuX893b7oIjbVe/LZ7EPV+L4ZGw/hIaz1mtEYPa4mniAJ+L0IRxVgoknc+nbX3zitJHb+I+ABsAvADVd0cu/m4iExW1WMiMhlAt5NjIMqHWZdHtwel5F42uw714YGX92H1wtk4v70RAZ8XXg/w2YfNg3F8A5gdb4qsvS8dx1I9IiIAvg/gXVX9dtKPXgRwc+zrmwG84NQYiPLlZFAqZQopteyxZ2AE50ysw9TmANqaahEcNf69I5FIYswfnB7Gg1v3Fdz6mLX3pePkjH8+gCUAfikiv4jd9jcAvgXgWRG5DcDvAXzBwTEQ5cWpLo+lTiFlO/HK6Pe+elY7TgyOYtnTZ45pXLNoDnr6R7HrUB+A/N4Us/UNIuewVw+RAacCtFFe++pZ7bj3s7Nt63lvVM8OwFKNey61/asWzMKyp7sS3+eTBmPtvbPYq4coB4WeBWsmNYU0d1ozbp53XmJzVKFvMGZvWLU1Htz02FuWXqO2xoPVC2cnFncjgGH6J/6GUshMnbX3pcHAT2TCiaCUmkpZfsX5iaoaoPBFZLNF6dULZ1t6jZODo4k3iLjHb7nYMO01pTmA7Suv5Ezdhdirh6iIUhdXWxv8hrPp4Fg4r0Vfs0Xper837TajnLzR4x96dT/W35he23/OhDqc21KPtqZaBn2X4YyfqIhSU0giYjibPtoXxOBIKOeUj9mi9NDo+CBvtlBt9PiegRFMbq6zPe1FpcMZP1GRxVNI57bU45wJdWk7YtcunoM6nwcPbt2XsUTSqCzUrEvlR1rrLe26NXt8c8CfGDNn+O7Hqh6iEvtwcATvHDqFer8XfcExrNt2AD0DI1i1YBZ
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x='total_spend',y='sales',data=df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Least Squares Line\n",
"\n",
"Full formulas available on Wikipedia: https://en.wikipedia.org/wiki/Linear_regression ,as well as in ISLR reading."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Understanding what a line of best fit answers.**\n",
"If someone was to spend a total of $200 , what would the expected sales be? We have simplified this quite a bit by combining all the features into \"total spend\", but we will come back to individual features later on. For now, let's focus on understanding what a linear regression line can help answer.\n",
"\n",
"**Our next ad campaign will have a total spend of $200, how many units do we expect to sell as a result of this?**"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:xlabel='total_spend', ylabel='sales'>"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEHCAYAAACp9y31AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy86wFpkAAAACXBIWXMAAAsTAAALEwEAmpwYAABNS0lEQVR4nO29e3wc9Xnv//7O7OxNq6sl+SYJYwMxl3CzMaQh1CFJU0JKkoYEDO0v/TWt6Wl60stpmpy2oSk9/f3CaZo2aWiDSdomJ2CSQFJo0pAmIY4hBQwGDBgbDLKx5Jus6660t9mZ7/ljdlcrebXalbTSSnrer5dfkkYzO8+OvJ/5zvN9vp9Haa0RBEEQlg/GQgcgCIIgzC8i/IIgCMsMEX5BEIRlhgi/IAjCMkOEXxAEYZnhW+gAyqG1tVWvW7duocMQBEFYVOzdu7dfa902efuiEP5169bxzDPPLHQYgiAIiwql1BvFtkuqRxAEYZkhwi8IgrDMEOEXBEFYZojwC4IgLDNE+AVBEJYZi6KqRxAEoVrsOtjH3bu76RmK09kc5rZr1rN1Y/tCh1VVZMQvCMKyZdfBPm5/eD99sSRNIYu+WJLbH97ProN9Cx1aVRHhFwRh2XL37m4sUxH2+1DK+2qZirt3dy90aFVFhF8QhGVLz1CckGVO2BayTHqH4gsU0fwgwi8IwrKlszlMwnYmbEvYDh3N4QWKaH4Q4RcEYdly2zXrsR1NPJ1Ba++r7Whuu2b9QodWVUT4BUFYtmzd2M4dN1xIe32QkYRNe32QO264cMlX9Ug5pyAIy5qtG9vnVejnq3w0OSmFVYiM+AVBEOaJ+SgfHU1l6BmMMziWnnIfEX5BEIR5oprlo/F0ht6hOH3RJLbjltxXUj2CIAjzRM9QnKaQNWHbbMtHE2mHwXiaVInUzmRE+AVBEOaJzuYwfbEkYf+49M60fDRpOwyOpUvm8qdChF8QBGEOKTV5e9s167n94f3E0xlClknCdiouH03aDkPxNIl05YKfQ4RfEISqMRcVLLVoojZVTLnJW8tUEyZv72C8eugOvFx/71CcjgreTyrjMDRmE09nZh2/0lrP+kWqzebNm7X03BWExUWhCBaObiupk5+L15hrSsV09+7uM1I58XSG9vogO7dfNaPzpTMuQ/E0Y6nKBN9xNeetatirtd48+XdS1SMIQlWYiwqWWjRRKxXTXHr/pDMufdEkvUPxikR/JGHzLz8/zK/+439NuU/VUj1KqU7g68AqwAV2aK2/oJT6DPDbwOnsrn+qtf6PasUhCMLCMBcVLNWogpktpWKai8lb23EZjtvEknZFcfWPpvj2M738+wvHSdoLV86ZAf6H1vpZpVQ9sFcp9aPs7/5Oa/25Kp5bEIQFZi5EcC6rYOaKUjHNZvI247gMJ2xiSc83qFyODSf45tM9/HD/SWzHO64uYPKhTZ385RTHVE34tdYngBPZ72NKqQPA2mqdTxCEcWphQnQuKljm4jXmmlIxzWTy1nE1w/E00QoF/3D/GDv3HOXRg3242cOawxY3burghkvWsCISmFL452VyVym1DtgNXAT8EfAbQBR4Bu+pYKjIMduB7QBdXV2b3njjjarHKQhLgVqaEM3dgCqtYJnr15hr5iImx9WMJGyiCRu3Ah0+cCLKfU8d5eevD+S3tdcHuPmKTq67aBWB7BxD0DJZ2xwuOrlbdeFXSkWAnwF/rbX+jlJqJdAPaOCvgNVa698s9RpS1SMI5bNtx5NzXlkizB1uVvBHKhB8rTXP9wxz31NH2Xt0OL+9oznELVu6eOf57fjMibU6pYS/qnX8SikLeBC4V2v9newbOFXw+3uA71UzBkFYbtTihOhiZq7SZq6riSY9wXfc8gX/ie4B7nvqKC+fiOW3n9Me4dYru7j6nFZMQ1UcSzWrehTwVeCA1vrzBdtXZ/P/AB8AXqpWDIKwHKnFCdGFZqbiPd2CrHLQWhNNZBhOpMsWfMfV7HrlNDv3HKW7fyy//aI1DdxyZRdXnt2CJ7Ezo5oj/rcCvw68qJR6PrvtT4FtSqlL8VI9R4DbqhiDICw7anFCdCGZjXgX1uwDhP0+4ukMd+/unvZYrTXRZIaRuE3GLV1emSOdcfnRy6e4/+kejg0n8ts3n9XMrVd1cUlHU1mvMx3VrOp5HCh2S5KafUGoIrOxBViKzEa8Z5o2iyVthuP2tPbIORK2w3+8eIJvPt1D/6jno6+At53byi1XdnHeyvqyXqdcxKtHEJYgc91VqhbKQ2fKbOY8Kk2bjaYyDI2lyxb80WSGf3v+GA8+e4yRhLdgy1DwzvNXsm1LJ2etqCvrdSazp3uQb+/txWpb9+ZivxfhFwShJOWkSmr5xjCbOY9y02bxdIbBsTTpTHmCPxRP8+DeXh56/jhjWZdNy1Rcd9Fqbr6ik1WNwQre4ThKKfYdHeJLu14j4DNAu0W9HkT4BUEoyXSpkrmYAK0ms5nzmC5tVmkTlL5okm8+08v3XzyRv0mELJMbLlnNjZs6WBEJzOg9moaiPmjREPTxp995kYDPmHCjm4wIvyAIJZkuVTKbHPp8MNs5j2Jps0o98XsG49z/dA8/evkUmWxlT0PQxwcuW8sHLltLw6TrWy5+n0FDyKI+4MtX+RT7e01GhF8QhJJMlypZDOsG5mrOo1JP/Nf7Rrlvz1F+9urpvK3Cijo/H97cwXsvXkPIb5Z+gSmoC/hoDFkErTOPL/b3mowIvyAIJSlMlWQcl1PRFLbrYhmKXQf7lsW6gXTGZTieZrRMe+SXjo1w356jPNk9mN+2ujHIzVd08u4LV+H3Ve6IX5jOmbxKt5DCv9dUiPALglCSXKrkzkcOcmQgjmUqOppC2K7m9of3c+Pla3ng2WNLct1AxnEZKtMiWWvNM28Mcd9TR9nXO5LfftaKMLde2cXb39Q+o1W2xdI5pShMbaGMohovHbgEQSiLUh5At12zfkmtG3BczVA8XZZFsqs1j7/Wz31PHeXVU6P57W9aWc8tV3bx1nNWYMxglW1dwEdD0JpxOghAKTX/Xj2CICwdSuXy53rdwELhFBioTSf4Gcfl0VdOs/Opo7wxOD6fcWlnI7ds6WLTWc0V2yoYSlEf9NEQsrBKpHNmiwi/IAhlsZRz+ZU4ZqYzLo/sP8n9e3o4GU3mt1+1voVbtnRx0drGis9vmePpHGMG6aBKEeEXhCVEtRZS7TrYx3A8nc/xr6wP4DONRZ/Lr8RALZF2eHjfcb69t5fBMc9WwVDwi+e1ccuWLja0Ryo+f8hv0hiySlbgVAMRfkFYIlRrIdUXf/wqd+16nYzr4lMK19X0Dic4ty3Cp6+/YFGmeCoR/GjC5rvPHeO7zx0jmvQqZXyG4l0XeLYKlT7xKKWoC3iCH/DNPH8/G0T4BWGJMJuFVFM9Kew62Mddu17H1RrLNNAa0NBe76e5LrDoRL8Sx8yB0RTf3tvLv+87QSK7MjfgM7j+4tV8eFMH7Q2V2SqUW445H4jwC8ISYaYLqUo9Kdy9uxvH1fgMhUKhFLhoRuI2vcb8LNCai/SV1pofvHiCex47zPGRBKsbQtx8RSdb1recse+JkQT3P93DIy8VNC/3m7z/srV88PK1NIX9FZ270nLM+UCEXxCWCDOdfC31pNAzFCfgM8g4mpxmKQUpx52XSd3Zpq+01sRSGX744kk+/+NX8RmKhqCPgbEUX3j0EL/PuXnxPzIwxs49PfzkwKn8KtvGkMWNm9byvkvXEglUJpdhv7e6djblmNVChF8QlggzNSMr9aTQ2Rwm47gMjKXB9UTf0RqfYczLpO5s0leFnvj3PnUUn+E1n8+9v4TtcP/TPTTVWdz71FEeO9SfP7YtEuCmKzp4z5tXF7VFmApDKSJBr/5+Jqtz5wsRfkFYIszUjKzUk0LuZrKizk8smSGVcTENxXsuWsndu7v584deqqoN80zSV8U88U9EEzQEx9+f1hqtNS+fGOF3vvFsfntHs5cCetcFKyuqo7dMg4agRX1wfsoxZ4sIvyAsIWaykOoML55YCtvR+LPCd8cNF064mbxlfQsPPHtsXmyYK0lflWqCsrohxMB
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Basically, we want to figure out how to create this line\n",
"sns.regplot(x='total_spend',y='sales',data=df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's go ahead and start solving: $$y=mx+b$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Simply solve for m and b, remember, that as shown in the video, we are solving in a generalized form:\n",
"\n",
"$$ \\hat{y} = \\beta_0 + \\beta_1X$$\n",
"\n",
"Capitalized to signal that we are dealing with a matrix of values, we have a known matrix of labels (sales numbers) Y and a known matrix of total_spend (X). We are going to solve for the *beta* coefficients, which as we expand to more than just a single feature, will be important to build an understanding of what features have the most predictive power. We use y hat to indicate that y hat is a prediction or estimation, y would be a true label/known value.\n",
"\n",
"We can use NumPy for this (if you really wanted to, you could solve this by [hand](https://towardsdatascience.com/linear-regression-by-hand-ee7fe5a751bf))"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"X = df['total_spend']\n",
"y = df['sales']"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Help on function polyfit in module numpy:\n",
"\n",
"polyfit(x, y, deg, rcond=None, full=False, w=None, cov=False)\n",
" Least squares polynomial fit.\n",
" \n",
" Fit a polynomial ``p(x) = p[0] * x**deg + ... + p[deg]`` of degree `deg`\n",
" to points `(x, y)`. Returns a vector of coefficients `p` that minimises\n",
" the squared error in the order `deg`, `deg-1`, ... `0`.\n",
" \n",
" The `Polynomial.fit <numpy.polynomial.polynomial.Polynomial.fit>` class\n",
" method is recommended for new code as it is more stable numerically. See\n",
" the documentation of the method for more information.\n",
" \n",
" Parameters\n",
" ----------\n",
" x : array_like, shape (M,)\n",
" x-coordinates of the M sample points ``(x[i], y[i])``.\n",
" y : array_like, shape (M,) or (M, K)\n",
" y-coordinates of the sample points. Several data sets of sample\n",
" points sharing the same x-coordinates can be fitted at once by\n",
" passing in a 2D-array that contains one dataset per column.\n",
" deg : int\n",
" Degree of the fitting polynomial\n",
" rcond : float, optional\n",
" Relative condition number of the fit. Singular values smaller than\n",
" this relative to the largest singular value will be ignored. The\n",
" default value is len(x)*eps, where eps is the relative precision of\n",
" the float type, about 2e-16 in most cases.\n",
" full : bool, optional\n",
" Switch determining nature of return value. When it is False (the\n",
" default) just the coefficients are returned, when True diagnostic\n",
" information from the singular value decomposition is also returned.\n",
" w : array_like, shape (M,), optional\n",
" Weights to apply to the y-coordinates of the sample points. For\n",
" gaussian uncertainties, use 1/sigma (not 1/sigma**2).\n",
" cov : bool or str, optional\n",
" If given and not `False`, return not just the estimate but also its\n",
" covariance matrix. By default, the covariance are scaled by\n",
" chi2/sqrt(N-dof), i.e., the weights are presumed to be unreliable\n",
" except in a relative sense and everything is scaled such that the\n",
" reduced chi2 is unity. This scaling is omitted if ``cov='unscaled'``,\n",
" as is relevant for the case that the weights are 1/sigma**2, with\n",
" sigma known to be a reliable estimate of the uncertainty.\n",
" \n",
" Returns\n",
" -------\n",
" p : ndarray, shape (deg + 1,) or (deg + 1, K)\n",
" Polynomial coefficients, highest power first. If `y` was 2-D, the\n",
" coefficients for `k`-th data set are in ``p[:,k]``.\n",
" \n",
" residuals, rank, singular_values, rcond\n",
" Present only if `full` = True. Residuals is sum of squared residuals\n",
" of the least-squares fit, the effective rank of the scaled Vandermonde\n",
" coefficient matrix, its singular values, and the specified value of\n",
" `rcond`. For more details, see `linalg.lstsq`.\n",
" \n",
" V : ndarray, shape (M,M) or (M,M,K)\n",
" Present only if `full` = False and `cov`=True. The covariance\n",
" matrix of the polynomial coefficient estimates. The diagonal of\n",
" this matrix are the variance estimates for each coefficient. If y\n",
" is a 2-D array, then the covariance matrix for the `k`-th data set\n",
" are in ``V[:,:,k]``\n",
" \n",
" \n",
" Warns\n",
" -----\n",
" RankWarning\n",
" The rank of the coefficient matrix in the least-squares fit is\n",
" deficient. The warning is only raised if `full` = False.\n",
" \n",
" The warnings can be turned off by\n",
" \n",
" >>> import warnings\n",
" >>> warnings.simplefilter('ignore', np.RankWarning)\n",
" \n",
" See Also\n",
" --------\n",
" polyval : Compute polynomial values.\n",
" linalg.lstsq : Computes a least-squares fit.\n",
" scipy.interpolate.UnivariateSpline : Computes spline fits.\n",
" \n",
" Notes\n",
" -----\n",
" The solution minimizes the squared error\n",
" \n",
" .. math ::\n",
" E = \\sum_{j=0}^k |p(x_j) - y_j|^2\n",
" \n",
" in the equations::\n",
" \n",
" x[0]**n * p[0] + ... + x[0] * p[n-1] + p[n] = y[0]\n",
" x[1]**n * p[0] + ... + x[1] * p[n-1] + p[n] = y[1]\n",
" ...\n",
" x[k]**n * p[0] + ... + x[k] * p[n-1] + p[n] = y[k]\n",
" \n",
" The coefficient matrix of the coefficients `p` is a Vandermonde matrix.\n",
" \n",
" `polyfit` issues a `RankWarning` when the least-squares fit is badly\n",
" conditioned. This implies that the best fit is not well-defined due\n",
" to numerical error. The results may be improved by lowering the polynomial\n",
" degree or by replacing `x` by `x` - `x`.mean(). The `rcond` parameter\n",
" can also be set to a value smaller than its default, but the resulting\n",
" fit may be spurious: including contributions from the small singular\n",
" values can add numerical noise to the result.\n",
" \n",
" Note that fitting polynomial coefficients is inherently badly conditioned\n",
" when the degree of the polynomial is large or the interval of sample points\n",
" is badly centered. The quality of the fit should always be checked in these\n",
" cases. When polynomial fits are not satisfactory, splines may be a good\n",
" alternative.\n",
" \n",
" References\n",
" ----------\n",
" .. [1] Wikipedia, \"Curve fitting\",\n",
" https://en.wikipedia.org/wiki/Curve_fitting\n",
" .. [2] Wikipedia, \"Polynomial interpolation\",\n",
" https://en.wikipedia.org/wiki/Polynomial_interpolation\n",
" \n",
" Examples\n",
" --------\n",
" >>> import warnings\n",
" >>> x = np.array([0.0, 1.0, 2.0, 3.0, 4.0, 5.0])\n",
" >>> y = np.array([0.0, 0.8, 0.9, 0.1, -0.8, -1.0])\n",
" >>> z = np.polyfit(x, y, 3)\n",
" >>> z\n",
" array([ 0.08703704, -0.81349206, 1.69312169, -0.03968254]) # may vary\n",
" \n",
" It is convenient to use `poly1d` objects for dealing with polynomials:\n",
" \n",
" >>> p = np.poly1d(z)\n",
" >>> p(0.5)\n",
" 0.6143849206349179 # may vary\n",
" >>> p(3.5)\n",
" -0.34732142857143039 # may vary\n",
" >>> p(10)\n",
" 22.579365079365115 # may vary\n",
" \n",
" High-order polynomials may oscillate wildly:\n",
" \n",
" >>> with warnings.catch_warnings():\n",
" ... warnings.simplefilter('ignore', np.RankWarning)\n",
" ... p30 = np.poly1d(np.polyfit(x, y, 30))\n",
" ...\n",
" >>> p30(4)\n",
" -0.80000000000000204 # may vary\n",
" >>> p30(5)\n",
" -0.99999999999999445 # may vary\n",
" >>> p30(4.5)\n",
" -0.10547061179440398 # may vary\n",
" \n",
" Illustration:\n",
" \n",
" >>> import matplotlib.pyplot as plt\n",
" >>> xp = np.linspace(-2, 6, 100)\n",
" >>> _ = plt.plot(x, y, '.', xp, p(xp), '-', xp, p30(xp), '--')\n",
" >>> plt.ylim(-2,2)\n",
" (-2, 2)\n",
" >>> plt.show()\n",
"\n"
]
}
],
"source": [
"help(np.polyfit)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([0.04868788, 4.24302822])"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Returns highest order coef first!\n",
"np.polyfit(X,y,1)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"# Potential Future Spend Budgets\n",
"potential_spend = np.linspace(0,500,100)"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"predicted_sales = 0.04868788*potential_spend + 4.24302822"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[<matplotlib.lines.Line2D at 0x1a948c95408>]"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAAD4CAYAAAD1jb0+AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy86wFpkAAAACXBIWXMAAAsTAAALEwEAmpwYAAAfgElEQVR4nO3deXhU5fn/8fcDBMISlgCBsISENYQkbGETF1TqAooi2mqtuGPtz29b2wpBXEBEca+ttYo7LWotSWQRFUQQdwWVbCQQIIQlJKwhJIQs8/z+yGgpZQnJTE5m5vO6rlwz8+SEc98j+XA8OeeOsdYiIiK+p5HTBYiISO0owEVEfJQCXETERynARUR8lAJcRMRHNanPnXXo0MFGRkbW5y5FRHzeunXr9lprOx6/Xq8BHhkZydq1a+tzlyIiPs8Ys+1E6zqFIiLioxTgIiI+SgEuIuKjFOAiIj5KAS4i4qMU4CIiPkoBLiLioxTgIiJedKCknFlLMjhUVuHxP7teb+QREQkU1lqWpe3mwcXpHCytYHSvDoyN6eTRfSjARUQ8rPBQGfcvSufDjALiurZh/i0jiOnS2uP7UYCLiHiItZZ/r93B7PcyKa90Mf3SaG49O4omjb1ztloBLiLiAXn7SpmeksrnOfsYHhXKY5PiierQ0qv7VICLiNRBlcvy+he5PPlhNo0bGR6+MpZfDo+gUSPj9X0rwEVEamlTQTFTk1L5Pu8g5/fryJyJcXRp27ze9q8AFxE5Q+WVLl74ZDN//XgTrZo14c+/GMQVg7pgjPePuo+lABcROQPrtx9kWlIqWbuLuXxgFx68PIYOrZo5UosCXESkBo6UV/HMRxt5+dMtdAxpxkuTE/iZh6/rPlMKcBGR0/hqyz4Sk1LJ3VfKdcO7M31cf1oHBzldlgJcRORkDpVVMPf9LN78Oo+I0Ba8edsIzurdwemyfqIAFxE5gZUbCpiRkk5hcRm3nxPFH37Wj+ZNGztd1n9RgIuIHGPf4aPMWpLJ4vW76NcphBduGMqg7m2dLuuEFOAiIlTfBr94/S5mLcmkuKyCu8f25c4xvWjapOEObVWAi0jA211UxoyUNFZmFTKwe1senxRPv84hTpd1WgpwEQlY1lre/nY7j7y3gQqXi/vG9+fm0VE0rofb4D1BAS4iAWnbvhISk9L4css+RvVsz9xJcfRo793hU56mABeRgFLlsrz62VaeWpFNUKNGzL0qjl8M617vt8F7ggJcRAJG9u5ipi5cz/odRYztH8bDV8bRuU2w02XVmgJcRPxeeaWLv63K4fnVObQODuKv1w3msvhwnzzqPpYCXET82vd5B5iWlMrGgsNcMagLD14+gNCWTZ0uyyMU4CLil0rLK3lq+UZe/XwrnVsH8+pNCVwQ7ezwKU9TgIuI3/kiZy+JyWnk7S/l+hERJF4aTUgDGD7laQpwEfEbRUcqeHTZBt7+djuR7Vvw9pSRjOzZ3umyvEYBLiJ+YUVmAfe9m8ae4qPccV5P7h7bl+CghjV8ytMU4CLi0/YePsrMxRksTc0nunMIL01OIL5bW6fLqhenDXBjTHdgPtAZcAHzrLXPGmNmArcDe9yb3mutXeatQkVEjmWtZdEPu5i1JIPDRyt9YviUp9XkCLwS+KO19jtjTAiwzhizwv25Z6y1T3qvPBGR/7Xr4BFmpKSxKnsPgyOqh0/16dTwh0952mkD3FqbD+S7nxcbYzYAXb1dmIjI8Vwuy4Jv8njs/SyqXJb7L4vhprMifWb4lKed0TlwY0wkMBj4GhgN3GWMmQyspfoo/cAJvmYKMAUgIiKirvWKSIDaureEaUmpfLN1P6N7t+fRifFEtG/hdFmOMtbamm1oTCvgE2COtTbZGNMJ2AtYYDYQbq295VR/RkJCgl27dm0dSxaRQFJZ5eKVz7by9IqNNG3SiPvHx3BNQjefvw3+TBhj1llrE45fr9ERuDEmCEgCFlhrkwGstQXHfP4lYKmHahURAWBD/iGmLkwlbWcRF8V0YvaVsXRq7bvDpzytJlehGOAVYIO19ulj1sPd58cBJgLp3ilRRALN0coqnvs4h7+v3kzbFkH87ZdDGBfXOaCOumuiJkfgo4EbgDRjzA/utXuB64wxg6g+hZIL3OGF+kQkwKzbVj18KqfwMFcN6cr942No5yfDpzytJlehfAac6J89XfMtIh5TcrSSJ5dn8/oXuYS3Dub1m4cxpl+Y02U1aLoTU0Qc99mmvSQmp7LjwBEmj+rB1EuiadVM8XQ6eodExDFFpRXMWZbJO2t30LNDS965YxTDo0KdLstnKMBFxBEfpO/m/kXp7C8p5zdjevHbC/v4/fApT1OAi0i9KiwuY+biDJal7SYmvDWv3TSM2K5tnC7LJynARaReWGtJ/m4nDy3N5EhFFfdc3I8p5/YkqHHgDJ/yNAW4iHjdjgOl3JuSzpqNexjaox2PTYqnd1grp8vyeQpwEfEal8vyz6+38dj7WVhg5uUxTB4VSaMAHT7laQpwEfGKzXsOM21hKmu3HeDcvh15ZGIs3doF9vApT1OAi4hHVVS5mLdmC8+u3ETzoMY8cXU8Vw8NrOFT9UUBLiIek76ziGlJqWTsOsS4uM7MnDCAsBANn/IWBbiI1FlZRRV//XgTL3yyhdCWTXnhV0O4JDbc6bL8ngJcROpkbe5+pialsmVPCdcM7cZ942No0yLI6bICggJcRGql5GglT3yYzRtf5tKlTXPm3zKcc/t2dLqsgKIAF5EztmbjHqYnp7Gr6Ag3jorknov70VLDp+qd3nERqbGDpeXMXrqBpO920KtjSxb+ehRDe2j4lFMU4CJSI8vS8nlgUQYHS8u56/ze3HVBbw2fcpgCXEROqfBQGQ8syuCDjN3Edm3NG7cMY0AXDZ9qCBTgInJC1lr+vW4HDy/NpKzSxbRLorn9nCiaaPhUg6EAF5H/sX1/KfempPHppr0Mjwxl7qQ4enbU8KmGRgEuIj+pclnmf5nLEx9mY4DZVwzg+hE9NHyqgVKAiwgAOYXFTF2Yynd5Bzmvb0ceuSqOrm2bO12WnIICXCTAVVS5ePGTzfxlZQ4tmjXm6Z8PZOLgrho+5QMU4CIBLH1nEfcsTGVD/iHGx4Uzc8IAOoY0c7osqSEFuEgAKquo4s8fbeKlT7fQvmVTXrxhKBcP6Ox0WXKGFOAiAeabrfuZlpTK1r0l/CKhO/eO70+b5ho+5YsU4CIBorisgsc+yOKfX+XRPbQ5C24bwejeHZwuS+pAAS4SAFZlFzIjOY38Q2XcenYUf7yoLy2a6tvf1+m/oIgf219SzuylmaR8v5M+Ya1IuvMshkS0c7os8RAFuIgfstayNDWfmYszKDpSwW8v7MP/O78XzZpo+JQ/UYCL+JmCQ2XMSEnnow0FxHdrw4LbRxDdubXTZYkXKMBF/IS1ln99u505yzZQXulixrj+3Dw6UsOn/JgCXMQP5O0rJTE5lS8272NEVCiPTYonskNLp8sSLzttgBtjugPzgc6AC5hnrX3WGBMK/AuIBHKBn1trD3ivVBE5XpXL8voXuTz5YTaNGxkemRjHtcO6a/hUgKjJEXgl8Edr7XfGmBBgnTFmBXATsNJaO9cYkwgkAtO8V6qIHGtjQfXwqR+2H+SC6DDmTIwlvI2GTwWS0wa4tTYfyHc/LzbGbAC6AlcAY9ybvQGsRgEu4nXllS7+vnozz63aRKtmTXj22kFMGNhFw6cC0BmdAzfGRAKDga+BTu5wx1qbb4wJO8nXTAGmAERERNSpWJFAt377QaYlpZK1u5jLB3Zh5uUxtG+l4VOBqsYBboxpBSQBv7fWHqrpv/bW2nnAPICEhARbmyJFAt2R8iqe+WgjL3+6hbCQYF6enMDYmE5OlyUOq1GAG2OCqA7vBdbaZPdygTEm3H30HQ4UeqtIkUD25eZ9TE9OJXdfKdcN7870cf1pHazhU1Kzq1AM8AqwwVr79DGfWgzcCMx1Py7ySoUiAepQWQVz38/iza/z6NG+BW/ePoKzemn4lPxHTY7ARwM3AGnGmB/ca/dSHdzvGGNuBfKAa7xSoUgAWrmhgBkp6RQWl3H
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.plot(potential_spend,predicted_sales)"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[<matplotlib.lines.Line2D at 0x1a948dc6bc8>]"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEHCAYAAACp9y31AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy86wFpkAAAACXBIWXMAAAsTAAALEwEAmpwYAAA+tklEQVR4nO3deXjTVdbA8e9N27RpC7SUFtkUZACtiGIRWVxwQx0ZkcWdRREBK4obgqMovsjIJi7jAqgIqIwbMCJuKDPIiApSEEQWERHZpKW0ULqlbe77xy8JaZukaZs0bXI+z8PTNk1+S2c8uTn33HOV1hohhBDhwxTsCxBCCFG3JPALIUSYkcAvhBBhRgK/EEKEGQn8QggRZiKDfQG+aNasmW7btm2wL0MIIRqUjIyMI1rr5IqPN4jA37ZtWzZs2BDsyxBCiAZFKbXX3eOS6hFCiDAjgV8IIcKMBH4hhAgzEviFECLMSOAXQogwI4FfCCHCjAR+IYQIMxL4hRCiPsrPh2nToKDA74eWwC+EEPWJ1vDuu3DGGfDoo/DZZ34/hQR+IYSoLzZvhj594JZbICUFvvkGBg3y+2kk8AshRLBlZ0N6Opx3HmzbBvPmwfr10Lt3QE7XIHr1CCFESCotNYL8pElw7BiMHQuTJ0NiYkBPK4FfCCGC4euv4b77YMsWuOwyeOEF6Ny5Tk4tqR4hhKhLf/wBN91k5PJzc+HDD+Grr+os6IOM+IUQom4UFsKsWfDMM0blzpNPwiOPQGxsnV+KBH4hhAgkreHf/4YHH4Tff4fBg403gNNOC9olSapHCCECZds26NsXBg6EuDhYtQo++CCoQR8k8AshAsBm02TlFXMgp4CsvGJsNh3sS6pbubnwwAPQpQts2AD//Cf8+KMxiVsPSKpHCOFXNptm5+E87lq0gf05hbROtPDasG50at4Ik0kF+/ICy2aDN980VtweOQKjRsHTT0OzZsG+snJkxC+E8KvsfKsz6APszynkrkUbyM63BvnKAuy77+CCC2DkSOjQwRjpz5lT74I+SOAXQviZtbTMGfQd9ucUYi0tC9IVBdihQzB8OPTqBQcOwNtvG60Wzjsv2FfmkQR+IYRfmSMjaJ1oKfdY60QL5siIIF1RgFitMHMmdOxoNFWbMAF27oTbbgNVv1NaEviFEH6VFGfmtWHdnMHfkeNPijMH+cr86LPP4OyzjTr8Pn1g61ajhXKjRsG+Mp/I5K4Qwq9MJkWn5o1Ylt4ba2kZ5sgIkuLMtZ7Ytdk02flWvx6z2n791ajWWbHCyON/8gn89a91ew1+IIFfCOF3JpMiuVG0344X9EqhEydg6lSYPRvMZpgxA8aNM75vgCTVI4So94JWKaQ1vPMOdOpkpHJuvhl++QXGj2+wQR8k8AshGoCgVApt2gQXXQRDhkDLlka55sKF0KJF4M5ZRyTwCyHqvTqtFDpyBMaMgbQ0Y3T/+uuwbh306OH/cwWJBH4hRL3irt1DnVQKlZYarRU6dDCC/f33G4H/zjvBFFqhUiZ3hQhBNamAqeuqGXfnAzxO4gaiUsjpP/8xNkX5+We44gpjU5TUVP8cux6SwC9EiKlJBUxdV814Ol9SvNntJO6y9N4kN4r2a6UQAHv3wkMPwZIl0LYtLFsG/fvX+wVYtRWwzy9KqTZKqf8qpbYrpX5WSo2zPz5ZKXVAKfWj/V/DK4IVoh6rSQVMXVfNeDpfUUkdTeIWFBh7255xBnz6KUyZYrRQvv76kA/6ENgRfynwkNZ6o1KqEZChlPrS/rvntNazAnhuIcJWTSpg6rpqxtP5IpSidaKl3O/8OomrtTG6f+ghYwvEG2802i6ceqp/jt9ABGzEr7U+pLXeaP8+D9gOtArU+YQQhppUwNR1fx1P57OYIwI3ibt1q5G/v+EGaNIEVq+G994Lu6APdVTVo5RqC3QF1tkfGquU2qKUmq+USqyLaxCiPqiLDUpqUgFT1/11PJ0vwWJ2TuKunXApy9J7136eISfHWGV77rlGbf4//wkbN8Ill/jnZhogpXVgd8ZRSsUDXwNTtdZLlVLNgSOABqYALbTWI9y8bhQwCuDUU09N27t3b0CvU4hAq8sJ1IZa1ePX85WVwfz58Pe/w9GjMHo0/N//1cv++IGilMrQWner9HggA79SKgpYAXyhtZ7t5vdtgRVa687ejtOtWze9YcOGwFykEHUkK6+YAa+srZS/dlSshIM6e3P59lu4915jZH/RRfDii8aIP8x4CvwBm9xVSingDWC7a9BXSrXQWh+y/zgA2BqoaxCiPgnVDUp8DeZ18onn4EGjL/7bb0OrVrB4sdFfJwwqdaojkDn+3sBQ4LIKpZszlFI/KaW2AJcCDwTwGoSoN0JxgxJHMB/wylp6T/8vA15Zy87DeW7nLgJaMlpcDNOnG5uivP8+PPYY7NgBt9wiQd+NgI34tdbfAO7+4p8G6pxC1GeOCc1Ki5Ya8AYlnoK5u/RVwD7xfPKJ0V7h11/huuuM1snt29fumCFOVu4KUUf8sUFJvdiMxEV1grnjE4/favR/+cXYFOXTT422yZ9/Dldd5dNL69vfsa5J4BeiDtVmgxJvOXIgKIGsOsHcb594jh+Hp5+G558HiwWefRbGjvW5P37QN3WpBwJezukPUtUjhOeqoKXpvcg+YQ1KIKtuEK3VSNtmMyZtJ0yAP/+EO+6AZ56B5s2rdc3hVF1V51U9Qgj/8pRWKSqx+Zxn97fqpq9q/IlnwwajPPP776F7d/joI+NrDYRqdVV1hFaTaSFCmKeqoAhFUAOZI5i3SowluVG0fz9lZGbCyJFGkN+zB95809gJq4ZBH0Kzuqq6JPAL0UC4a3Mwd2gaMWZT6AWykhIjh9+xo7Hd4YMPGpO5t99e601R6ro9RX0kOX4hGpDSUhsHjxWSmVdMdr6VJRn7mHjNmRSX2kJnsvKrr4xNUbZvh759jU1RzjjDr6cIl6oeyfELEQJyCku49fV15VI72w7lsXxs78DtTlVX9uwx2iUvWwannw7//rdRlx+ABVi1qa4KBRL4hWhAPE1MFlrLaJUYG6SrqqWCApg2DWbMgIgImDrVSO3ExAT7ykKWBH4h6oA/Ugs2m0YpxYdjepKdb2XO6t1s2pfbcPP5WsMHH8DDD8O+fUZ7hRkzoHXrYF9ZyJPAL0SA+WPBkLtjTB/UhYXf7uGBKzs1vInJLVuMPP7XX8M558A77xhdNEWdkKoeIQKsus3J3G3WciS/uNIxJizZwuTrOjesSdyjR41Vtl27GjtizZkDGRkS9OuYjPiFCLDqLBjy9OmgcUyk22NorYMe9H1KY5WVwWuvweOPGzti3X23sSlK06bBuegwJyN+IQKsOguGPH06KCyxBbxWvybbQvrUlvl//4Nu3Yxgf/bZxvaHL70kQT+IJPALEWDVWTDk6dNBXlEJ0wd1Kb94a0ia33L71emr78prGmv/frj1Vrj4YsjONjY2/89/oEsXv1yzqDlJ9QgRYNXpZ+Op22VuQQkvrtrFpH6pJFiiKLCW0Soxxm+LkKrTV9+VuzeqrKxjmGdMg9kzjBTPpEkwcSLENtBy0xAkgV+IOuDrgiHX1sXJ8dHcd3kH2jWLI8KkSG5kZvRbGbROtLBoRHcOHSv222rdmjYuK/dGpTVX/Lqep1a/TpOjh2DAAKNlcrt21b4eEVgS+IWoRxyfDpaP7c2h3CJGv53hDOxzh6YxpX9nistsRJpMDJu/3m8dOWu6SYrjjWrKrKWMXvZPLtmzkeIOnbAtXoDpqr7Vvg5RNyTHL0Q9YzIpymw4gz4YgX30WxlsO5THxTNWczC30K8dOWvauMyUd5wzZj3FOy/exUXZu8mbNpP89Rs41P1CnyeIRd2TEb8Q9YzNpiksKXUb2GPNxgg8O9/q120Mq70tpM0GixbBxImozEwYMQL99FT2awt3zV0fGs3iQpiM+IWoRxzVNbsz892Wb+YWlgAwZ/XuSlU+tW0t7HNf/XXroGdPYwesdu1g/Xp4/XWy4xKqtVBNBI+M+IWoRxzVNcnx0Uwf1IUJS7Y4R88zB3dhxuc7Adi0L5eF3+7h/dE90Vr
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x='total_spend',y='sales',data=df)\n",
"plt.plot(potential_spend,predicted_sales,color='red')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Our next ad campaign will have a total spend of $200, how many units do we expect to sell as a result of this?**"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [],
"source": [
"spend = 200\n",
"predicted_sales = 0.04868788*spend + 4.24302822"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"13.98060422"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"predicted_sales"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Further considerations...which we will explore in much more depth!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Overfitting, Underfitting, and Measuring Performance\n",
"\n",
"Notice we fit to order=1 , essentially a straight line, we can begin to explore higher orders, but does higher order mean an overall better fit? Is it possible to fit too much? Too little? How would we know and how do we even define a good fit?"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 3.07615033e-07, -1.89392449e-04, 8.20886302e-02, 2.70495053e+00])"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.polyfit(X,y,3)"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [],
"source": [
"# Potential Future Spend Budgets\n",
"potential_spend = np.linspace(0,500,100)"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [],
"source": [
"predicted_sales = 3.07615033e-07*potential_spend**3 + -1.89392449e-04*potential_spend**2 + 8.20886302e-02*potential_spend**1 + 2.70495053e+00"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[<matplotlib.lines.Line2D at 0x1a945c52908>]"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEHCAYAAACp9y31AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy86wFpkAAAACXBIWXMAAAsTAAALEwEAmpwYAABAhklEQVR4nO3deXiTVfbA8e9J2rShLC3QIlgQZBAHFcGiKDjjjhujIrgCghsgMm6j4obLoOMAKuIoUlwQRVQU+IHgAiK44IKggAgioiKbtCyF7m2a+/vjTWLaJm3aJl3S83kenrZpkve+9fG8N+c991wxxqCUUqrxsNX1AJRSStUuDfxKKdXIaOBXSqlGRgO/Uko1Mhr4lVKqkYmp6wGEonXr1qZjx451PQyllGpQ1qxZs9cYk1z28QYR+Dt27Mjq1avrehhKKdWgiMi2QI9rqkcppRoZDfxKKdXIaOBXSqlGRgO/Uko1MhEL/CISLyKrRGSdiPwgIo94Hn9YRHaKyFrPvwsiNQallFLlRbKqpxA40xiTIyKxwOci8r7nd5ONMU9E8NhKKaWCiFjgN1bbzxzPj7Gef9oKVCml6lhEc/wiYheRtUAGsNQY87XnV2NEZL2IvCwiSUFeO0JEVovI6szMzEgOUyml6p/iYvjPfyAvL+xvHdHAb4wpMcb0AFKBk0TkWOB5oDPQA9gNPBnktdONMb2MMb2Sk8stPFNKqehlDNxyC9x/P3z4YdjfvlaqeowxWcAK4DxjzB7PBcENvACcVBtjUEqpBuPZZ2HaNLjrLhgwIOxvH8mqnmQRSfR87wTOBn4UkbZ+TxsAbIjUGJRSqsH54AO47Ta46CJ4/PGIHCKSVT1tgZkiYse6wMwxxiwSkddEpAfWjd7fgJERHINSSjUcGzfCFVfAccfB66+D3R6Rw0Syqmc90DPA40MjdUyllGqw9uyBCy6AJk1g4UJo2jRih2oQ3TmVUiqq5eVZqZ2MDPj0U+jQIaKH08CvlFJ1ye2Ga66Bb76BefOgV6+IH1IDv1JK1aWxY2HuXHjqKbjkklo5pDZpU0qpuvLss/DEE3DzzVYlTy3RwK+UUnXh//7PWqR18cUwZQqI1NqhNfArpVRt++oruOoqOPFEmD07YmWbwWjgV0qp2rR5M/TvD+3awbvvWuWbtUwDv1JK1ZZdu+Dcc60Z/pIlkJJSJ8PQqh6llKoNBw9aC7T27oVPPoHOnetsKBr4lVIq0goKrFLNH36AxYshLa1Oh6OBXymlIsnlsm7krlhh9d/p16+uR6Q5fqWUihhjYNQoq3TzmWfg6qvrekSABn6llIqce+6Bl16CcePgn/+s69H4aOBXSqlIePxxmDgRbroJHnmkrkdTigZ+pZQKt6lT4b77YPBgqy1DLa7KDYUGfqWUCqdZs6zeOxddBDNmgK3+hdn6NyKllGqo3nkHhg2DM8+Et96C2Ni6HlFAGviVUiocFi+2yjZPOQUWLID4+LoeUVAa+JVSqqY++ggGDoQePawLQAS3TQwHDfxKKVUTK1ZY+fyjjoIPPoAWLep6RJXSwK+UUtX12Wdw4YXQqZM162/Vqq5HFJKIBX4RiReRVSKyTkR+EJFHPI+3FJGlIrLF8zUpUmNQSqmI+eILOP98a2P0jz+us06b1RHJGX8hcKYx5nigB3CeiJwM3AMsM8Z0AZZ5flZKqYZj5UqrvXK7dlbQb9OmrkdUJREL/MaS4/kx1vPPABcDMz2PzwQuidQYlFIq7D7/HM47D9q2heXLra8NTERz/CJiF5G1QAaw1BjzNdDGGLMbwPM14OcjERkhIqtFZHVmZmYkh6mUUqH57DMr6LdrZ93UPfzwuh5RtUQ08BtjSowxPYBU4CQRObYKr51ujOlljOmVnJwcsTEqpVRIli+3gn5qqhX027Wr6xFVW61U9RhjsoAVwHnAHhFpC+D5mlEbY1BKqWpbssTaPatTJyvoN8D0jr9IVvUki0ii53sncDbwI7AQGOZ52jBgQaTGoJRSNbZ4MfzjH9C1qzXrP+ywuh5RjUVyB662wEwRsWNdYOYYYxaJyJfAHBG5HvgduCyCY1BKqep7+22rw+bxx8OHH0LLlnU9orCIWOA3xqwHegZ4fB9wVqSOq5RSYfHKK3D99VbvncWLG8SK3FDpyl2llCrr2Wfh2mvhrLOsmX4UBX3QwK+UUn8yBsaPt7ZJvOQSePddSEio61GFXSRz/Eop1XC43XDHHTBlClxzjbVXbkx0hkid8SulVHGxldqZMgVuu83aOStKgz7ojF8p1djl5cHll1s3cP/9b3jggXq3R264aeBXSjVe+/ZZNfpffw3TpsHIkXU9olqhgV8p1Tj9/rvVguGXX6x6/UsvresR1RoN/EqpxmftWqsFQ16eVa552ml1PaJapTd3lVKNy9Kl8Pe/g91utVhuZEEfNPArpSLA7TZkZhey80AemdmFuN2mrodkmTnTmul37AhffgnHhtwwOKpoqkcpFVZut2HznmxufHU1Ow7kk5rk5IVretG1TTNstjqqljEGHnrIWpx11lkwd27UrcatCp3xK6XCal9ukS/oA+w4kM+Nr65mX25R3QyosNBakDV+vFWr/957jTrog874lVJhVuQq8QV9rx0H8ilyldT+YPbuhQEDrFz+o4/CffdFfY1+KDTwK6XCyhFjJzXJWSr4pyY5ccTYa3cgmzZB//6waxe8+SZccUXtHr8e01SPUiqsWiU4eOGaXqQmOQF8Of5WCY4avW+VbhgvWWK1U87JsXbM0qBfis74lVJhZbMJXds0Y/7ovhS5SnDE2GmV4KjRjd2QbxgbA888YzVbO+YYq7vmEUeE4ayii874lVJhZ7MJyc3iODypCcnN4mpczRPSDeOiIrjxRqvJ2kUXwRdfaNAPQgO/Uqreq/SG8R9/wBlnWK2UH3jAKtds2rQORtowaKpHKVWvuN2GfblFpdJEFd4wXrXK6rOzf7/exA2RzviVikLVWTlb26ttAx3Pm8sfMHUlfScsZ8DUlWzek02SMzbwDeO3Z1vtF2JjrZW4GvRDIsbUk6XUFejVq5dZvXp1XQ9DqQahOitna3u1bbDjtWrq4NKpX5Sb2c8f3ZdWCY4/PwmUuGj9wN1IejqceSa89Ra0bh32cTZ0IrLGGNOr7OMRm/GLSHsRWS4im0TkBxG51fP4wyKyU0TWev5dEKkxKNUYVWflbG2vtg12vILi4Ll83w3jvAMk/+NcK+jffbfVXVODfpVEMsfvAv5ljPlWRJoBa0Rkqed3k40xT0Tw2Eo1WtVZOVvbq22DHc8uUvHir2XL4KqrID/f6qE/aFBExhftIjbjN8bsNsZ86/k+G9gEHB6p4ynVENRGHt17I9RfZStnq/OaSIzR6bAHzuU7Y+A//4F+/azZ/apVGvRroFZy/CLSEfgUOBa4AxgOHAJWY30qOBDgNSOAEQAdOnRI27ZtW8THqVQk1VYevT7n+L0VO263m725RYx8bU254wGlq3oKsrFdO9zaE/eqq2D6dC3VDFGwHH/EA7+INAU+AR4zxswTkTbAXsAA44G2xpjrKnoPvbmrokFmdiEDpq4MeOMyuVlcWI8VqCSysgBenddUdUz+F5d+3VJ44MJu2G0S/HhffGFV6mRkwJNPws03a5O1KggW+CNaxy8iscBc4HVjzDwAY8wev9+/ACyK5BiUqi9qM4/uvREa6ddA6BeMsjd0l2zMYOPu7MAXPrfbCvT33gsdOsDKldCrXPxS1RSxwC8iArwEbDLGPOX3eFtjzG7PjwOADZEag1L1Sb3pWhlGVUkRhXzhy8iw+ud/+KG1MOullyAxMcJn0rhEcgFXX2AocGaZ0s2JIvK9iKwHzgBuj+AYlKo3wtG1sr5taViVMtCQbiAvWwbHH2911Jw6Fd55JyJBv779HWtbxGb8xpjPgUDJuPcidUyl6rOadq2saHYNRDQ/H0xV0lfeC1+5RVsJDqvB2oMPwsSJ0LWrNdvv3j0iY66XW0PWMu3Vo1Qtqm4eHYLPrueN7sO+nKI6CWRVSV8FvfBt/RmuvhpWr4YRI+CppyAhIWJjDvZ3jMRN9vpKe/U
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x='total_spend',y='sales',data=df)\n",
"plt.plot(potential_spend,predicted_sales,color='red')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Is this better than our straight line fit? What are good ways of measuring this?**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Multiple Features\n",
"\n",
"The real data had 3 features, not everything in total spend, this would allow us to repeat the process and maybe get a more accurate result?"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [],
"source": [
"X = df[['TV','radio','newspaper']]\n",
"y = df['sales']"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"ename": "TypeError",
"evalue": "expected 1D vector for x",
"output_type": "error",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[1;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m<ipython-input-41-f24479bbc916>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[1;31m# Note here we're passing in 3 which matches up with 3 unique features, so we're not polynomial yet\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 2\u001b[1;33m \u001b[0mnp\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mpolyfit\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mX\u001b[0m\u001b[1;33m,\u001b[0m\u001b[0my\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[1;32m<__array_function__ internals>\u001b[0m in \u001b[0;36mpolyfit\u001b[1;34m(*args, **kwargs)\u001b[0m\n",
"\u001b[1;32mc:\\users\\marcial\\anaconda3\\envs\\ml_master\\lib\\site-packages\\numpy\\lib\\polynomial.py\u001b[0m in \u001b[0;36mpolyfit\u001b[1;34m(x, y, deg, rcond, full, w, cov)\u001b[0m\n\u001b[0;32m 597\u001b[0m \u001b[1;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"expected deg >= 0\"\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 598\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mx\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mndim\u001b[0m \u001b[1;33m!=\u001b[0m \u001b[1;36m1\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 599\u001b[1;33m \u001b[1;32mraise\u001b[0m \u001b[0mTypeError\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"expected 1D vector for x\"\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 600\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mx\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0msize\u001b[0m \u001b[1;33m==\u001b[0m \u001b[1;36m0\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 601\u001b[0m \u001b[1;32mraise\u001b[0m \u001b[0mTypeError\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"expected non-empty vector for x\"\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
"\u001b[1;31mTypeError\u001b[0m: expected 1D vector for x"
]
}
],
"source": [
"# Note here we're passing in 3 which matches up with 3 unique features, so we're not polynomial yet\n",
"np.polyfit(X,y,1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Uh oh! Polyfit only works with a 1D X array! We'll need to move on to a more powerful library...**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"-------\n",
"--------"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 1
}