You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

1267 lines
216 KiB

2 years ago
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"___\n",
"\n",
"<a href='http://www.pieriandata.com'><img src='../Pierian_Data_Logo.png'/></a>\n",
"___\n",
"<center><em>Copyright by Pierian Data Inc.</em></center>\n",
"<center><em>For more information, visit us at <a href='http://www.pieriandata.com'>www.pieriandata.com</a></em></center>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Dealing with Outliers\n",
"\n",
"In statistics, an outlier is a data point that differs significantly from other observations.An outlier may be due to variability in the measurement or it may indicate experimental error; the latter are sometimes excluded from the data set. An outlier can cause serious problems in statistical analyses.\n",
"\n",
"Remember that even if a data point is an outlier, its still a data point! Carefully consider your data, its sources, and your goals whenver deciding to remove an outlier. Each case is different!\n",
"\n",
"## Lecture Goals\n",
"* Understand different mathmatical definitions of outliers\n",
"* Use Python tools to recognize outliers and remove them\n",
"\n",
"### Useful Links\n",
"\n",
"* [Wikipedia Article](https://en.wikipedia.org/wiki/Outlier)\n",
"* [NIST Outlier Links](https://www.itl.nist.gov/div898/handbook/prc/section1/prc16.htm)\n",
"\n",
"-------------"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Imports"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Generating Data"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# Choose a mean,standard deviation, and number of samples\n",
"\n",
"def create_ages(mu=50,sigma=13,num_samples=100,seed=42):\n",
"\n",
" # Set a random seed in the same cell as the random call to get the same values as us\n",
" # We set seed to 42 (42 is an arbitrary choice from Hitchhiker's Guide to the Galaxy)\n",
" np.random.seed(seed)\n",
"\n",
" sample_ages = np.random.normal(loc=mu,scale=sigma,size=num_samples)\n",
" sample_ages = np.round(sample_ages,decimals=0)\n",
" \n",
" return sample_ages"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"sample = create_ages()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([56., 48., 58., 70., 47., 47., 71., 60., 44., 57., 44., 44., 53.,\n",
" 25., 28., 43., 37., 54., 38., 32., 69., 47., 51., 31., 43., 51.,\n",
" 35., 55., 42., 46., 42., 74., 50., 36., 61., 34., 53., 25., 33.,\n",
" 53., 60., 52., 48., 46., 31., 41., 44., 64., 54., 27., 54., 45.,\n",
" 41., 58., 63., 62., 39., 46., 54., 63., 44., 48., 36., 34., 61.,\n",
" 68., 49., 63., 55., 42., 55., 70., 50., 70., 16., 61., 51., 46.,\n",
" 51., 24., 47., 55., 69., 43., 39., 43., 62., 54., 43., 57., 51.,\n",
" 63., 41., 46., 45., 31., 54., 53., 50., 47.])"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sample"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Visualize and Describe the Data"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:>"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAD4CAYAAADiry33AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy86wFpkAAAACXBIWXMAAAsTAAALEwEAmpwYAAARiUlEQVR4nO3df6zddX3H8edLhGwiCypX5KfVpWFDI5XdFA2bQRTTNkS2xW2tm7LNpHOBRBKXDV3ifiRLTBbdphhJJ0zMBH8ziVaFMBd1UfSWFYGVjo4hXMvoVSPoNDPV9/4438bj9Rx67/me9vZ89nwkJ+f7/Xw/3+/n88ltX/3eT78/UlVIktr1pLXugCTpyDLoJalxBr0kNc6gl6TGGfSS1Lgnr3UHRjnllFNq3bp1a90NSZoZu3bt+kZVzY3adkwG/bp161hYWFjrbkjSzEjytXHbnLqRpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGHZN3xkr6STfe8dCatf3qC85es7Y1HZ7RS1LjDHpJapxBL0mNM+glqXEGvSQ17rBBn+SsJJ9NsifJvUne0JU/PcltSe7vvp82Zv9NSfYm2Zfk6mkPQJL0xFZyRn8QeGNV/SLwIuCKJOcCVwO3V9V64PZu/SckOQ54F7AZOBfY1u0rSTpKDhv0VfVIVd3ZLX8H2AOcAVwG3NBVuwH41RG7bwT2VdUDVfUD4APdfpKko2RVc/RJ1gEvBO4ATq2qR2DwjwHwzBG7nAE8PLS+2JVJko6SFQd9kqcCHwWuqqrHV7rbiLIac/ztSRaSLCwtLa20W5Kkw1hR0Cc5nkHIv7+qPtYVP5rktG77acCBEbsuAmcNrZ8J7B/VRlXtqKr5qpqfmxv5InNJ0gRWctVNgOuAPVX19qFNtwCXd8uXAx8fsftXgPVJnpPkBGBrt58k6ShZyRn9hcBrgIuT7O4+W4C3ApckuR+4pFsnyelJdgJU1UHgSuAzDP4T90NVde8RGIckaYzDPr2yqr7A6Ll2gJeNqL8f2DK0vhPYOWkHJUn9eGesJDXOoJekxhn0ktQ4g16SGuerBKVVWMtX+kmT8oxekhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklq3GGfdZPkeuBS4EBVPb8r+yBwTlflZODbVbVhxL4PAt8BfggcrKr5qfRakrRiK3mo2XuBa4D3HSqoqt86tJzkbcBjT7D/S6vqG5N2UJLUz0peJfi5JOtGbeteHP6bwMVT7pckaUr6ztH/CvBoVd0/ZnsBtybZlWT7Ex0oyfYkC0kWlpaWenZLknRI36DfBtz0BNsvrKrzgc3AFUleMq5iVe2oqvmqmp+bm+vZLUnSIRMHfZInA78OfHBcnara330fAG4GNk7aniRpMn3O6F8O3FdVi6M2JjkxyUmHloFXAPf0aE+SNIHDBn2Sm4AvAuckWUzyum7TVpZN2yQ5PcnObvVU4AtJ7gK+DHyyqj49va5LklZiJVfdbBtT/rsjyvYDW7rlB4DzevZPktSTd8ZKUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjVvLiEemYc+MdD611F6SZ4Rm9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJatxK3jB1fZIDSe4ZKvvzJF9Psrv7bBmz76Yke5PsS3L1NDsuSVqZlZzRvxfYNKL8b6pqQ/fZuXxjkuOAdwGbgXOBbUnO7dNZSdLqHTboq+pzwLcmOPZGYF9VPVBVPwA+AFw2wXEkST30maO/MslXu6mdp43Yfgbw8ND6Ylc2UpLtSRaSLCwtLfXoliRp2KRB/27g54ENwCPA20bUyYiyGnfAqtpRVfNVNT83NzdhtyRJy00U9FX1aFX9sKp+BPw9g2ma5RaBs4bWzwT2T9KeJGlyEwV9ktOGVn8NuGdEta8A65M8J8kJwFbglknakyRN7rBPr0xyE3ARcEqSReDPgIuSbGAwFfMg8Add3dOB91TVlqo6mORK4DPAccD1VXXvkRiEJGm8wwZ9VW0bUXzdmLr7gS1D6zuBn7r0UpJ09HhnrCQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY07bNB3L/8+kOSeobK/TnJf93Lwm5OcPGbfB5PcnWR3koUp9luStEIrOaN/L7BpWdltwPOr6gXAfwBveoL9X1pVG6pqfrIuSpL6OGzQV9XngG8tK7u1qg52q19i8OJvSdIxaBpz9L8PfGrMtgJuTbIryfYptCVJWqXDvjP2iST5U+Ag8P4xVS6sqv1JngncluS+7jeEUcfaDmwHOPvss/t0S5I0ZOIz+iSXA5cCv11VNapO97JwquoAcDOwcdzxqmpHVc1X1fzc3Nyk3ZIkLTNR0CfZBPwJ8Mqq+t6YOicmOenQMvAK4J5RdSVJR85KLq+8CfgicE6SxSSvA64BTmIwHbM7ybVd3dOT7Ox2PRX4QpK7gC8Dn6yqTx+RUUiSxjrsHH1VbRtRfN2YuvuBLd3yA8B5vXonSerNO2MlqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxvZ5eKal9N97x0Jq0++oLfIrttHhGL0mNM+glqXEGvSQ1zqCXpMYZ9JLUOINekhq3kjdMXZ/kQJJ7hsqenuS2JPd3308bs++mJHuT7Ety9TQ7LklamZWc0b8X2LSs7Grg9qpaD9zerf+EJMcB7wI2A+cC25Kc26u3kqRVO2zQV9XngG8tK74MuKFbvgH41RG7bgT2VdUDVfUD4APdfpKko2jSOfpTq+oRgO77mSPqnAE8PLS+2JWNlGR7koUkC0tLSxN2S5K03JH8z9iMKKtxlatqR1XNV9X83NzcEeyWJP3/MmnQP5rkNIDu+8CIOovAWUPrZwL7J2xPkjShSYP+FuDybvly4OMj6nwFWJ/kOUlOALZ2+0mSjqKVXF55E/BF4Jwki0leB7wVuCTJ/cAl3TpJTk+yE6CqDgJXAp8B9gAfqqp7j8wwJEnjHPYxxVW1bcyml42oux/YMrS+E9g5ce8kSb15Z6wkNc6gl6TGGfSS1DiDXpIa5ztjNbG1epeopNXxjF6SGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWrcxEGf5Jwku4c+jye5almdi5I8NlTnLb17LElalYkfalZVe4ENAEmOA74O3Dyi6uer6tJJ25Ek9TOtqZuXAf9ZVV+b0vEkSVMyraDfCtw0ZtuLk9yV5FNJnjfuAEm2J1lIsrC0tDSlbkmSegd9khOAVwIfHrH5TuDZVXUe8E7gn8Ydp6p2VNV8Vc3Pzc317ZYkqTONM/rNwJ1V9ejyDVX1eFV9t1veCRyf5JQptClJWqFpBP02xkzbJHlWknTLG7v2vjmFNiVJK9TrVYJJngJcAvzBUNnrAarqWuBVwB8mOQh8H9haVdWnTUnS6vQK+qr6HvCMZWXXDi1fA1zTpw1JOtrW6n3Ir77g7CNyXO+MlaTGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxvW6M1bHhrW6i086kvxzPT2e0UtS4wx6SWqcQS9JjTPoJalxBr0kNa5X0Cd5MMndSXYnWRixPUnekWRfkq8mOb9Pe5Kk1ZvG5ZUvrapvjNm2GVjffS4A3t19S5KOkiM9dXMZ8L4a+BJwcpLTjnCbkqQhfYO+gFuT7EqyfcT2M4CHh9YXu7KfkmR7koUkC0tLSz27JUk6pG/QX1hV5zOYorkiyUuWbc+IfUa+HLyqdlTVfFXNz83N9eyWJOmQXkFfVfu77wPAzcDGZVUWgbOG1s8E9vdpU5K0OhMHfZITk5x0aBl4BXDPsmq3AK/trr55EfBYVT0ycW8lSavW56qbU4Gbkxw6zo1V9ekkrweoqmuBncAWYB/wPeD3+nVXkrRaEwd9VT0AnDei/Nqh5QKumLQ
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.distplot(sample,bins=10,kde=False)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:>"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAWAAAAD4CAYAAADSIzzWAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy86wFpkAAAACXBIWXMAAAsTAAALEwEAmpwYAAAJjElEQVR4nO3df6jd913H8dc7924snRuzN7WU1C2UK9tkuG6WOZmI1nZkMvafMEEa9s/+kdusKKL+p+C/Ynv/EMpEEvzFnIoyQlg6HQz/UJLZkW1t2WHL3OLWZre4iqmTm3z845y6EFa22+Xkfb8njweEe+73hnzf79xzn/meb3JJjTECwM13oHsAgFuVAAM0EWCAJgIM0ESAAZqs7+UnHzp0aBw5cmRJowCspnPnzn1rjHHH9cf3FOAjR47k7NmzN24qgFtAVX31ex13CwKgiQADNBFggCYCDNBEgAGaCDBAEwEGaCLAAE0EGKCJAAM0EWCAJgIM0ESAAZoIMEATAQZoIsAATQQYoIkAAzQRYIAme/o/4WAZtre3M5vN2s5/8eLFJMnhw4fbZng5m5ub2dra6h6DJRFg2s1mszz5+ady5bbbW86/dvnbSZJvfmd/fTmsXX6+ewSWbH8947hlXbnt9rz4ll9uOffBp08lSdv5X85Lc7G63AMGaCLAAE0EGKCJAAM0EWCAJgIM0ESAAZoIMEATAQZoIsAATQQYoIkAAzQRYIAmAgzQRIABmggwQBMBBmgiwABNBBigiQADNBFggCYCDNBEgAGaCDBAEwEGaCLAAE0EGKCJAAM0EWCAJgIM0ESAAZoIMEATAQZoIsAATQQYoIkAAzQRYIAmAgzQRIABmggwQBMBBmgiwNfY3t7O9vZ29xjAPrLMLqwv5VedqNls1j0CsM8sswuugAGaCDBAEwEGaCLAAE0EGKCJAAM0EWCAJgIM0ESAAZoIMEATAQZoIsAATQQYoIkAAzQRYIAmAgzQRIABmggwQBMBBmgiwABNBBigiQADNBFggCYCDNBEgAGaCDBAEwEGaCLAAE0EGKCJAAM0EWCAJgIM0ESAAZoIMEATAQZoIsAATQQYoIkAAzS5KQHe2dnJww8/nJ2dnZtxOoBJuCkBPnHiRM6fP5+TJ0/ejNMBTMLSA7yzs5PTp09njJHTp0+7CgZYWF/2CU6cOJGrV68mSa5cuZKTJ0/mkUceWfZpX5GLFy/mxRdfzPHjx7tHuaXMZrMc+N/RPca+c+B/Xshs9l+ej81ms1kOHjy4lF/7+14BV9WHq+psVZ29dOnSnk/wxBNPZHd3N0myu7ubM2fO7H1KgBX0fa+AxxiPJ3k8Se677749X6Y88MADOXXqVHZ3d7O+vp4HH3zwFYx5cxw+fDhJ8uijjzZPcms5fvx4zn352e4x9p2rr3l9Nu+50/Ox2TJfgSz9HvCxY8dy4MD8NGtra3nooYeWfUqASVh6gDc2NnL06NFUVY4ePZqNjY1lnxJgEpb+l3DJ/Cr4woULrn4BrnFTAryxsZHHHnvsZpwKYDJ8KzJAEwEGaCLAAE0EGKCJAAM0EWCAJgIM0ESAAZoIMEATAQZoIsAATQQYoIkAAzQRYIAmAgzQRIABmggwQBMBBmgiwABNBBigiQADNBFggCYCDNBEgAGaCDBAEwEGaCLAAE0EGKCJAAM0EWCAJgIM0ESAAZoIMEATAQZoIsAATQQYoIkAAzQRYIAm690D7Cebm5vdIwD7zDK7IMDX2Nra6h4B2GeW2QW3IACaCDBAEwEGaCLAAE0EGKCJAAM0EWCAJgIM0ESAAZoIMEATAQZoIsAATQQYoIkAAzQRYIAmAgzQRIABmggwQBMBBmgiwABNBBigiQADNBFggCYCDNBEgAGaCDBAEwEGaCLAAE0EGKCJAAM0EWCAJgIM0ESAAZoIMEATAQZoIsAATQQYoIkAAzQRYIAmAgzQZL17AEiStcvP5+DTp5rOvZMkbed/OWuXn09yZ/cYLJEA025zc7P1/Bcv7iZJDh/eb7G7s/33huUSYNptbW11jwAt3AMGaCLAAE0EGKCJAAM0EWCAJgIM0ESAAZoIMEATAQZoIsAATQQYoIkAAzQRYIAmAgzQRIABmggwQBMBBmgiwABNBBigiQADNKkxxg/+k6suJfnq8sbZs0NJvtU9xA20avskq7fTqu2TrN5O+3GfN40x7rj+4J4CvN9U1dkxxn3dc9woq7ZPsno7rdo+yertNKV93IIAaCLAAE2mHuDHuwe4wVZtn2T1dlq1fZLV22ky+0z6HjDAlE39ChhgsgQYoMkkAlxVP15V/1RVT1XVF6rq+OL47VV1pqq+tHj7o92z/qCq6jVV9a9V9bnFTr+3OD7ZnZKkqtaq6t+q6hOL96e+z4WqOl9VT1bV2cWxye5UVW+oqo9X1dOLr6efnfg+b158bl768UJVfWQqO00iwEl2k/zGGOOtSd6d5Ner6ieT/HaST40xfiLJpxbvT8V3ktw/xnh7knuTHK2qd2faOyXJ8SRPXfP+1PdJkl8cY9x7zb8tnfJOjyY5PcZ4S5K3Z/65muw+Y4xnFp+be5P8dJLLSf4uU9lpjDG5H0n+PsmDSZ5Jctfi2F1Jnume7RXuc1uSzyb5mSnvlOTuzJ/s9yf5xOLYZPdZzHwhyaHrjk1ypySvT/KVLP7yfer7fI/93pvkn6e001SugP9fVR1J8o4k/5LkzjHGN5Jk8fbHGkfbs8XL9SeTPJfkzBhj6jv9UZLfSnL1mmNT3idJRpJPVtW5qvrw4thUd7onyaUkf7q4TfTRqnptprvP9T6Y5C8Xjyex06QCXFU/kuRvknxkjPFC9zw/rDHGlTF/6XR3kndV1duaR3rFqur9SZ4bY5zrnuUGe88Y451J3pf5ra+f7x7oh7Ce5J1J/niM8Y4k/539+tJ8j6rq1Uk+kOSvu2fZi8kEuKpelXl8/3yM8beLw89W1V2Lj9+V+ZXk5Iwx/jPJp5MczXR3ek+SD1TVhSR/leT+qvqzTHefJMkY4z8Wb5/L/N7iuzLdnb6e5OuLV1pJ8vHMgzzVfa71viSfHWM8u3h/EjtNIsBVVUn+JMlTY4w/vOZD/5Dk2OLxsczvDU9CVd1RVW9YPD6Y5IEkT2eiO40xfmeMcfcY40jmLwX/cYzxa5noPklSVa+tqte99Djze4yfz0R3GmN8M8nXqurNi0O/lOSLmeg+1/nVfPf2QzKRnSbxnXBV9XNJPpPkfL57f/F3M78P/LEkb0zy70l+ZYzxfMuQe1RVP5XkRJK1zP8g/NgY4/eraiMT3eklVfULSX5zjPH+Ke9TVfdkftWbzF++/8UY4w8mvtO9ST6a5NVJvpzkQ1k8/zLBfZKkqm5L8rUk94wxvr04NonP0SQCDLCKJnELAmAVCTBAEwEGaCLAAE0EGKCJAAM0EWCAJv8HuKD3YaBd99gAAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.boxplot(sample)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"count 100.00000\n",
"mean 48.66000\n",
"std 11.82039\n",
"min 16.00000\n",
"25% 42.00000\n",
"50% 48.00000\n",
"75% 55.25000\n",
"max 74.00000\n",
"dtype: float64"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser = pd.Series(sample)\n",
"ser.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Trimming or Fixing Based Off Domain Knowledge\n",
"\n",
"If we know we're dealing with a dataset pertaining to voting age (18 years old in the USA), then it makes sense to either drop anything less than that OR fix values lower than 18 and push them up to 18."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 56.0\n",
"1 48.0\n",
"2 58.0\n",
"3 70.0\n",
"4 47.0\n",
" ... \n",
"95 31.0\n",
"96 54.0\n",
"97 53.0\n",
"98 50.0\n",
"99 47.0\n",
"Length: 99, dtype: float64"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser[ser > 18]"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"99"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# It dropped one person\n",
"len(ser[ser > 18])"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"def fix_values(age):\n",
" \n",
" if age < 18:\n",
" return 18\n",
" else:\n",
" return age"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 56.0\n",
"1 48.0\n",
"2 58.0\n",
"3 70.0\n",
"4 47.0\n",
" ... \n",
"95 31.0\n",
"96 54.0\n",
"97 53.0\n",
"98 50.0\n",
"99 47.0\n",
"Length: 100, dtype: float64"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# \"Fixes\" one person's age\n",
"ser.apply(fix_values)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"100"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(ser.apply(fix_values))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"--------"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are many ways to identify and remove outliers:\n",
"* Trimming based off a provided value\n",
"* Capping based off IQR or STD\n",
"* https://towardsdatascience.com/ways-to-detect-and-remove-the-outliers-404d16608dba\n",
"* https://towardsdatascience.com/5-ways-to-detect-outliers-that-every-data-scientist-should-know-python-code-70a54335a623"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Ames Data Set\n",
"\n",
"Let's explore any extreme outliers in our Ames Housing Data Set"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv(\"../DATA/Ames_Housing_Data.csv\")"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PID</th>\n",
" <th>MS SubClass</th>\n",
" <th>MS Zoning</th>\n",
" <th>Lot Frontage</th>\n",
" <th>Lot Area</th>\n",
" <th>Street</th>\n",
" <th>Alley</th>\n",
" <th>Lot Shape</th>\n",
" <th>Land Contour</th>\n",
" <th>Utilities</th>\n",
" <th>...</th>\n",
" <th>Pool Area</th>\n",
" <th>Pool QC</th>\n",
" <th>Fence</th>\n",
" <th>Misc Feature</th>\n",
" <th>Misc Val</th>\n",
" <th>Mo Sold</th>\n",
" <th>Yr Sold</th>\n",
" <th>Sale Type</th>\n",
" <th>Sale Condition</th>\n",
" <th>SalePrice</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>526301100</td>\n",
" <td>20</td>\n",
" <td>RL</td>\n",
" <td>141.0</td>\n",
" <td>31770</td>\n",
" <td>Pave</td>\n",
" <td>NaN</td>\n",
" <td>IR1</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>5</td>\n",
" <td>2010</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>215000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>526350040</td>\n",
" <td>20</td>\n",
" <td>RH</td>\n",
" <td>80.0</td>\n",
" <td>11622</td>\n",
" <td>Pave</td>\n",
" <td>NaN</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>MnPrv</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>6</td>\n",
" <td>2010</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>105000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>526351010</td>\n",
" <td>20</td>\n",
" <td>RL</td>\n",
" <td>81.0</td>\n",
" <td>14267</td>\n",
" <td>Pave</td>\n",
" <td>NaN</td>\n",
" <td>IR1</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Gar2</td>\n",
" <td>12500</td>\n",
" <td>6</td>\n",
" <td>2010</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>172000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>526353030</td>\n",
" <td>20</td>\n",
" <td>RL</td>\n",
" <td>93.0</td>\n",
" <td>11160</td>\n",
" <td>Pave</td>\n",
" <td>NaN</td>\n",
" <td>Reg</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>4</td>\n",
" <td>2010</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>244000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>527105010</td>\n",
" <td>60</td>\n",
" <td>RL</td>\n",
" <td>74.0</td>\n",
" <td>13830</td>\n",
" <td>Pave</td>\n",
" <td>NaN</td>\n",
" <td>IR1</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>MnPrv</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>2010</td>\n",
" <td>WD</td>\n",
" <td>Normal</td>\n",
" <td>189900</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 81 columns</p>\n",
"</div>"
],
"text/plain": [
" PID MS SubClass MS Zoning Lot Frontage Lot Area Street Alley \\\n",
"0 526301100 20 RL 141.0 31770 Pave NaN \n",
"1 526350040 20 RH 80.0 11622 Pave NaN \n",
"2 526351010 20 RL 81.0 14267 Pave NaN \n",
"3 526353030 20 RL 93.0 11160 Pave NaN \n",
"4 527105010 60 RL 74.0 13830 Pave NaN \n",
"\n",
" Lot Shape Land Contour Utilities ... Pool Area Pool QC Fence Misc Feature \\\n",
"0 IR1 Lvl AllPub ... 0 NaN NaN NaN \n",
"1 Reg Lvl AllPub ... 0 NaN MnPrv NaN \n",
"2 IR1 Lvl AllPub ... 0 NaN NaN Gar2 \n",
"3 Reg Lvl AllPub ... 0 NaN NaN NaN \n",
"4 IR1 Lvl AllPub ... 0 NaN MnPrv NaN \n",
"\n",
" Misc Val Mo Sold Yr Sold Sale Type Sale Condition SalePrice \n",
"0 0 5 2010 WD Normal 215000 \n",
"1 0 6 2010 WD Normal 105000 \n",
"2 12500 6 2010 WD Normal 172000 \n",
"3 0 4 2010 WD Normal 244000 \n",
"4 0 3 2010 WD Normal 189900 \n",
"\n",
"[5 rows x 81 columns]"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:>"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAacAAAE+CAYAAAApht8TAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy86wFpkAAAACXBIWXMAAAsTAAALEwEAmpwYAACDOklEQVR4nO2dd7xcRfnGv8+9Nz0kobcAoSNgCCSho0FQihRRkaZSVASUoqLyE1EECyqCVBEQAogBKVIEaYHQWxLSIPRQAoEQWhLS731/f8xscu5mZ/bs3r03m8t889lPds/MnDPn7N4zZ2aeeV6ZGYlEIpFI1BMNy7oCiUQikUgUkxqnRCKRSNQdqXFKJBKJRN2RGqdEIpFI1B2pcUokEolE3ZEap0QikUjUHalxSiQSiUQQSVdImi5pUiBdks6X9LKkCZK2qcVxU+OUSCQSiRjDgT0j6XsBG/vX0cDfanHQ1DglEolEIoiZPQR8EMmyP3C1OZ4A+klas63HTY1TBknNksZJmiTpBkk9/fbZ/v8BkuZKekbSZElPSTp82dY6kUgklilrA29mPk/129pEU1t30MmYa2aDACRdCxwDnFOU5xUz29rn2QC4WVKDmV0Z2unCGa8GPaJ+N/i0YGWman4wbRPrHky7eeHUYNr31T+Y9tv5zwXTtu4VLvd+89xg2oZNfYNp/a1bMG2aFgTTAN5s+SSY1lddg2mb0iuYtqIpmDa+YV4wbRW6BNN6W/j57xmbGUyLsbX6BNNWawmfw6TGhcG0OTQH0yYvfD+Y1iXyfNu9IXxdtmjsF0xbSNhSbfd5jcG0J8N/EmXpFvnuI5eU2ZHrtqKF63rPoneCaQ+/NTJyxHzE7jnFdF11w+/jhuMKXGpml1ZwuFL1bbMvXmqcwjwMDIxlMLNXJf0Y+AsQbJwSiUSiQ2kJN5rF+IaoksaomKnAOpnP/YG327A/IA3rlURSE26Sb2KO7GOBzdq3RolEIlEB1pL/1XZuA77tVXvbAx+b2bS27jQ1Tq3pIWkcMBp4A/hHjjIlu+CSjpY0WtLoy68eUcMqJhKJRBlaWvK/yiBpBPA4sKmkqZK+I+kYScf4LHcCrwIvA5cBx9XiFNKwXmsWzzlVwNbA5OKN2a5yJeO/iUQi0VasNj0ivy87pEy6AT+o2QE9qXFqA5IGAGcDF8TyxUQPp445M5i22oAvBdO2XXHjYNrXm8LihWcawpPixzZuHkxTpHn9pHGFYNr7kQnjGVoUTOtCAx9ZuK7zW8Jl37CweOG07uF93rZoxWDaNs3h2faPG8IXJ3w0OHhhWNjQO/JE+1iP8PEGt8wJps1sCItB3lF4Dv4ohYVXk5vC38NHhNN6RgZtYmqAmOihucwc/KJIejfC4oX5hL+LPhHRw7zIH82WXVYOptWEHD2ieic1TpWzoaRngO7ALOCCmFIvUR2xhimRSJShhj2nZUVqnDKYWe/YdjN7DejRkXVKJBKJimle/h/uUuOUSCQSnY00rJdIJBKJeqOWgohlRWqcEolEorORek6lkTQ7NH9TIu8RwD1mttSKYknDgc8DH/tNV5jZ+W2s2zBggZk91pb9VELMhiimyJv+2j3BtE9O/G4wbex9YeufVyMzZmuExVXMjayIG7oofLyWiIBqTLewfdHqi8LWNwA7rzYrmNZn/bBC8NKxYUXegoia64D+4TWF976xVjDthS7hm8QKkRvI5G7hP80vzw/bRf2va89g2rf6vRtMmzEj/Of6dEO4Lt/rErZg6tErPO/xyPTVg2mx6zKhW1jLF7NSAugXuV+vvTD8m3mxa1iRNyeiyGuJ6A5/3u/DcGVqQeo51YQjgEmE7S5+amY3lkqQ1GRmkVtqSYYBs4EOa5wSiUSiQ0mCiPxIGgRcAvQEXgGOAnYDhgDXSpoL7GBm4cdCt59RuIZlJ+A27+hwNu5cngaONbP5kl4DrgL2BboABwLzcGauzZK+CRwP9AN+CXQF3gcOM7N3Ja0K/AtY2e93T2Cwmc3wZU/wZZ4EjjOz/GZWiUQi0Z50gmG9jrQvuhr4uZkNxHnW/dr3iEbjGoRBgYbpzz6MxThJn/Xb+pnZ54GLcIGwDjKzz+IaqGMzZWeY2Ta44Fcneyn4JcC5/ngPA48A23un8euAn/myvwbu9+X/A6wLIOkzwEHATt5Nohk4rLjSWfui52e9WvnVSiQSiWrpWG+9dqFDGidJfXENyoN+01XA53IW/6lvSAaZWcGI9Xr//6bAFDN7MbDfm/3/Y4ABgf33B+6WNBH4KbCF374zrrHCzO4CCoPEuwGDgad9r203YIPinZrZpWY2xMyGbLbCUsmJRCLRftTQW29ZUQ9zTtVQCOZTLu5JQYnQTPhcLwDOMbPbvFji9DL7FnCVmf1frpomEolEB9MZZhk6pHEys48lfShpFz+U9i2g0IuaBYTN2eI8DwyQtJGZvVy03xCzgKypWV/gLf8+G9X2EeAbwB8lfQkoyLxGArdKOtfMpktaCVjBzF4PHTAWGDDmkRdT5PU67/Jg2pRBvwqm9Y543b0ZF8gF6dkSVt2t3BzWq/SOPLSVq8u708M/mUULwoEIY0MFsWfI96eFfem6RBSJfSPBBj9sDKc1Rx673iT8e5ofUY+9/k6/YFqviEfeO43hfb46MxxMssfM8G+tt8JX+4PIdYndsMp5682P+AdObwor8rrFfCUj31MsaOKMGeHf04DwLvNTx8N1eWmvxqmnpGw41nNwN/5LfOjzV4Ejfdpwvz2XICKLmc2TdCRwg4/B9DRuTinG7cCNkvbHCSJO9+XfAp4A1vf5fgOMkHQQrsGbBszygohfAvdIasB5e/4ACDZOiUQi0aFEHgqXF9qlcTILPi5uXyLvTcBNgf0cUWLbsKLPI3FhK4rzDci8H42TkOPnp4oj3N5a4vAfA3uY2SJJOwC7mtl8v4/rWTLvlUgkEvVFBZFw65Xldc6pI1gX+LfvHS0AvreM65NIJBL5SMN6nRcze4kSPbJEIpGoe+pYhZeXdpGSS+ov6VZJL0l6RdJ5krq2x7GKjjvb/z9A0qRAni0k3S/pRV+33/jeUbXHfE3SKtWWTyQSiZrTCdY51bznJEm49UV/M7P9JTXiwpX/DreOqC37rsauKFu+B3AbzkXiHi/OuAk4ETi3LXWLcfPCqcG0WNTamEdeTJH37XFnBNOOG/LzYNqvDw+ri2xu2B9w3oQPgmlTnw+rue5TWLF0aHM4oivAxl8Pj6k3vxu+bjs/FNbb3NgtLBHc4MDwn8qUa8J/4DMi6rnj9g1ft5fvDCvyHm0KKxVfsLDn4LEDZgfTGiKKvJ/tFExCPcJKTZsT/s1MHhG219moWzjtd83h72FLwr8niEclfj8s1mNeRHXXPbKaZX6k3Bb/2D1SmxqQek4l+QIwrxAd1tv6/Ag4SlJPSU9KKix0RdIoSYMl9ZJ0haSnJT3j1XRIOkLSDZJuxynkeksaKWmspImFfDk5FHjUzO7xdZsD/BDfaEo6XdLJmbpN8qHYkXSLpDGSnpV0dBuuTyKRSLQr1rww96teaY85py1wjgyLMbOZkt4ANsK5LnwD+LWkNYG1zGyMpN/j7IKOktQPeErSfX4XOwADzewDLxk/wO9zFeAJSbeZWXyRQ7hur0jq4Y8Z4yh//B44d4ibzOz9HMdMJBKJjiX1nEoiKNmfLWz/N86EFVwjdYN//yXgFG8JNArojvezA+41sw8y+/m9pAnAfcDaQNh/P3/dynGCpPG4tVDrAOHVs7T21ps+J2S4nkgkEu1AjeecJO0p6QVJL0s6pUR6X0m3SxrvR5eOLLWfSmiPntOzwNeyGyT1wd3QXzGzOZLelzQQZ6D6/UI24Gtm9kJR2e1YYlcEzmR1VZxD+ELvPh4eoF+6bq08/SRtgDOI/UjSIlo32N19nmHA7rhFwnO8M3r0mGZ2KW6uje3XGpanV5dIJBK1oYY9J68buAj4IjAVN3J0m5k9l8n2A+A5M9vXR3R4QdK1ZhaeAC5DezROI4GzJH3bzK72J/YXYLif44El7t99M2audwPHSzrezEzS1mb2TIn99wWm+4ZpV2C9Cup
"text/plain": [
"<Figure size 432x288 with 2 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.heatmap(df.corr())"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"PID -0.246521\n",
"Enclosed Porch -0.128787\n",
"Kitchen AbvGr -0.119814\n",
"Overall Cond -0.101697\n",
"MS SubClass -0.085092\n",
"Low Qual Fin SF -0.037660\n",
"Bsmt Half Bath -0.035835\n",
"Yr Sold -0.030569\n",
"Misc Val -0.015691\n",
"BsmtFin SF 2 0.005891\n",
"3Ssn Porch 0.032225\n",
"Mo Sold 0.035259\n",
"Pool Area 0.068403\n",
"Screen Porch 0.112151\n",
"Bedroom AbvGr 0.143913\n",
"Bsmt Unf SF 0.182855\n",
"Lot Area 0.266549\n",
"2nd Flr SF 0.269373\n",
"Bsmt Full Bath 0.276050\n",
"Half Bath 0.285056\n",
"Open Porch SF 0.312951\n",
"Wood Deck SF 0.327143\n",
"Lot Frontage 0.357318\n",
"BsmtFin SF 1 0.432914\n",
"Fireplaces 0.474558\n",
"TotRms AbvGrd 0.495474\n",
"Mas Vnr Area 0.508285\n",
"Garage Yr Blt 0.526965\n",
"Year Remod/Add 0.532974\n",
"Full Bath 0.545604\n",
"Year Built 0.558426\n",
"1st Flr SF 0.621676\n",
"Total Bsmt SF 0.632280\n",
"Garage Area 0.640401\n",
"Garage Cars 0.647877\n",
"Gr Liv Area 0.706780\n",
"Overall Qual 0.799262\n",
"SalePrice 1.000000\n",
"Name: SalePrice, dtype: float64"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.corr()['SalePrice'].sort_values()"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:xlabel='SalePrice'>"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAW0AAAERCAYAAACw4faYAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy86wFpkAAAACXBIWXMAAAsTAAALEwEAmpwYAAAlYUlEQVR4nO3deXycV33v8c9vNkkjW7tsy7Zs2YntOAlJnNhZgQAJYQnb6xbaFLgNy73ppZRC23u58KItpfdFbym9XAqUtrmQpCUxCYQUmgAhLKFZSOQljp14SxzLkmXLtvZltIw0c+4f88iRbdka2zPzzPJ9v15+eebRzDM/nThfHZ3nPOeYcw4RESkMAb8LEBGR9Cm0RUQKiEJbRKSAKLRFRAqIQltEpIAotEVECkjWQtvM7jKzY2b2YobOt8zMHjOz3Wa2y8xaMnFeEZFCks2e9j3AWzN4vn8FvuycWwtcDRzL4LlFRApC1kLbOfcE0DfzmJldYGaPmtlWM3vSzC5K51xmdjEQcs793Dv3iHNuNPNVi4jkt1yPad8JfMI5dxXw34Fvpvm+1cCAmT1kZtvM7MtmFsxalSIieSqUqw8ys3nA9cD3zWz6cJn3tf8E/NUsbzvknHsLqTpfB6wDOoAHgA8B385u1SIi+SVnoU2qVz/gnLvi5C845x4CHjrDezuBbc65/QBm9kPgWhTaIlJicjY84pwbAtrM7H0AlnJ5mm/fDNSaWaP3/E3AriyUKSKS17I55e+7wDPAGjPrNLOPAh8APmpm24GdwLvTOZdzLkFqDPyXZvYCYMD/y07lIiL5y7Q0q4hI4dAdkSIiBSQrFyIbGhpcS0tLNk4tIlKUtm7d2uOca5zrdVkJ7ZaWFrZs2ZKNU4uIFCUza0/ndRoeEREpIAptEZECotAWESkgCm0RkQKi0BYRKSAKbRGRAqLQFhEpIAptEZECotAWESkguVxPW2axsbVj1uPvv2ZZjisRkUKgnraISAFRaIuIFBCFtohIAVFoi4gUEIW2iEgBUWiLiBQQhbaISAFRaIuIFBCFtohIAVFoi4gUEIW2iEgBSSu0zeyPzWynmb1oZt81s/JsFyYiIqeaM7TNbAnwR8B659ylQBC4LduFiYjIqdIdHgkBFWYWAqLA4eyVJCIipzNnaDvnDgF/B3QAXcCgc+6xk19nZneY2RYz29Ld3Z35SkVEJK3hkVrg3cAKYDFQaWYfPPl1zrk7nXPrnXPrGxsbM1+piIikNTxyM9DmnOt2zk0CDwHXZ7csERGZTTqh3QFca2ZRMzPgJmB3dssSEZHZpDOm3Qo8CDwHvOC9584s1yUiIrNIa49I59zngc9nuZaid7r9IEVE0qU7IkVECohCW0SkgCi0RUQKiEJbRKSAKLRFRAqIQjtPJJ2jta2X727qID6V9LscEclTaU35k+zqGhzjh9sOcbB/DIA1C+f7XJGI5Cv1tH2298gw//D4Pnpjcd571VIa5kXY3N7nd1kikqfU0/bZjs4BKiIh/vjmVUQjIUbGp3h05xH2HRvmwgXqcYvIidTT9ll73ygt9VGikdTPz3XLaggYPLD5oM+ViUg+Umj7aGRiir5YnGV10ePH5peHWdtUxQ+eO6QLkiJyCoW2jzp6RwFOCG2A9cvr6IvF+cXuo36UJSJ5TKHto46+GMGAsbim4oTjqxbOo6m6nPs1RCIiJ1Fo+6i9b5TF1eWEgyf+ZwiY8b71zTz5cjddg2M+VSci+Uih7ZOpZJJD/WMsr6+c9etvu3QRzsFv9vXmuDIRyWcKbZ90DYwzlXSnjGdPW7NwPlXlITa1ac62iLxKoe2Tjr7ZL0JOu3/zQRbXVPCL3UfZ2NqhDRREBFBo+6ajb5SaaJiqivBpX7OioZLeWJyh8ckcViYi+Uyh7ZOOvtHT9rKnrWhIjXcf6InloiQRKQAKbR8MjMYZHJucM7SbqiuIBAO0KbRFxKPQ9sH0ePbyutlnjkwLBozl9VEO9Cq0RSRFoe2Dzv4xQgFjUXX5nK9taajk6NAEoxNTOahMRPKdQtsHvbE4dZURggGb87Ut3jzuA94t7yJS2hTaPuiPxamvjKT12qW1FYQCpiESEQEU2jnnnKPP62mnIxwM0FwX1cVIEQEU2jk3MjFFPJFMO7QhNURyeGCMEY1ri5Q8hXaO9cXiAGcV2isaKnHAlgO6pV2k1Cm0c+zV0C5L+z3NtRUEDLa292erLBEpEArtHOuLxTGgNnr629dPVhYOsqi6nC0HFNoipU6hnWN9sThVFWFCwbNr+uX1lWw72M9kQluQiZQyhXaOnc3MkZla6isZn0yy8/BQFqoSkUKh0M6xcw3t5d46JboYKVLaFNo5FJ9KMjwxdU6hXVURprmuQuPaIiVOoZ1DfaNnP91vpvXL69jS3odzLpNliUgBUWjnUL833S/dW9hPtr6llp6ROO1ah0SkZCm0c6h3eo529NxCe0NLHQCbNa4tUrIU2jnUF5ugPBygIhI8p/df2DiPqvKQxrVFSphCO4f6YnHqohHM5l6SdTaBgLG+JTWuLSKlSaGdQ+c63W+m9S21vNIdO347vIiUlrRC28xqzOxBM9tjZrvN7LpsF1ZsEklH/+jk+Yf28tS4tuZri5SmdHvafw886py7CLgc2J29korTkaFxEkl3VgtFzeaypdVEQgE2tSm0RUpRaK4XmFkV8HrgQwDOuTig383PUoc3Te98e9rl4SBXNNfQqtAWKUnp9LRXAt3A3Wa2zcy+ZWanbCNuZneY2RYz29Ld3Z3xQgtdR19q55nzDW2Aa1fUsfPwIEPjk+d9LhEpLOmEdgi4EvhH59w6IAZ85uQXOefudM6td86tb2xszHCZha+jb5SAQXVF+kuyns41K+tJOtiqqX8iJSed0O4EOp1zrd7zB0mFuJyFg31j1ETT24F9LuuW1RAKmIZIRErQnKHtnDsCHDSzNd6hm4BdWa2qCB0eGMtILxsgGglx2dJqWtt6M3I+ESkc6c4e+QRwn5ntAK4A/jprFRWpwwNj1GQotCE1RPJC5yCjcW32K1JK0gpt59zz3nj1Zc659zjnNJh6FqYSSY4MjVNzFluMzeWaFXVMJZ32jRQpMbojMgeODk+QdFBTcf4zR6ZdtbyWgKH52iIlRqGdA10DYwBUZ7CnPb88zKVLqmndr9AWKSUK7Rw4NB3aGRzThtQQyfMHBxifTGT0vCKSv+a8I1LO3+GBcYDzvhC5sbXjhOfjk0niiSTbOga47oL68zq3iBQG9bRzYHq6X1n43NbRPp2W+koMjWuLlBKFdg50DY6xuKYi4+etiARZWFWunWxESoiGR3Lg0MA4i6vLs3LuloYom9r6+M4z7Sfcbfn+a5Zl5fNExF/qaefA4YHs9LQhNUQSTyQ57F3sFJHiptDOstjEFINjk9kL7YbUgosHemNZOb+I5BeFdpZ1DaZ6wItrsjM8UlUepr4ywoEehbZIKVBoZ9khb7pftnrakBoiOdA7StK5rH2GiOQHhXaWTY81ZzW0G6KMTSboHp7I2meISH5QaGfZ4YExAgYL55/f3pBn0lKfGtdu0xCJSNFTaGfZ4YFxFlWVEwpmr6nrKiPMLw/pYqRICVBoZ1k2p/tNM7PUuHZPDKdxbZGiptDOssODYzRlObQhNfVvaHyK/lFt9itSzBTaWZRMOroGxrM23W+mFd64tqb+iRQ3hXYW9cbixBNJluSgp72gqozycID2PoW2SDFTaGfR8el+1dkP7YAZy+qitPeOZv2zRMQ/Cu0smg7tphwMjwAsq6vk2PAEY3FtiiBSrBTaWTS9Y00uhkcAltdHAejQEIlI0VJoZ9HhgXGikWDGtxk7nebaKAGD9j4NkYgUK4V2Fh0eGKOpuhwzm/vFGRAJBWiqrtC4tkgRU2hn0aGBMZbWRnP6mcvqo3T2jzKZSOb0c0UkNxTaWXRoYIwltbkZz562vC7KZMKxu2sop58rIrmh0M6S0fgUfbE4S3Md2t5NNlvb+3P6uSKSGwrtLDnUn9uZI9OqK8JUV4TZotAWKUoK7Szp9EI712PakJr695xCW6QoaTf2LPm3bYcA2NzWx94
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.distplot(df[\"SalePrice\"])"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:xlabel='Overall Qual', ylabel='SalePrice'>"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZgAAAEGCAYAAABYV4NmAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy86wFpkAAAACXBIWXMAAAsTAAALEwEAmpwYAAA8QklEQVR4nO2deZhU5Znof++ppbu6AWnWq7QGDKiDDhq6Q1BmEpcJOhMj8YrCRJQQRxg1xmScRM0dLyaMeVyuMS4jiqPiFgUxXh1vXAjEySRxoXFciQhRVNQAsihL09VV571/1KmmqruqKaDqq0P3+3ueeqrqq/Od89ap7vOe711FVTEMwzCMcuNVWwDDMAyjZ2IKxjAMw6gIpmAMwzCMimAKxjAMw6gIpmAMwzCMihCttgBhYdCgQTp8+PBqi2EYhrFfsXz58k9UdXChz0zBBAwfPpyWlpZqi2EYhrFfISLvFfvMTGSGYRhGRTAFYxiGYVQEUzCGYRhGRTAFYxiGYVQEUzCGYRhGRbAoMsMwjF6K7ysbtydJptLEoxEG1sfxPCnb/k3BGIZh9EJ8X1m5bivn39fC2s2tNDYkuPPcZg4f2rdsSsZMZIZhGL2QjduTHcoFYO3mVs6/r4WN25NlO4YpGMMwjF5IMpXuUC5Z1m5uJZlKl+0YpmAMwzB6IbGoR2NDIm+ssSFBLFo+tWAKxjAMoxcS9YTrJ4/pUDKNDQmunzyGqDn5DcMwjH2hNZnmuqdXcuWpo+mfiLGltZ3rnl7Jrd/8AtSX5ximYAzDMHoh8WiEDdvamHX/8o6xxoYE8WikbMcwE5lhGEYvZGB9nDvPbc4zkd15bjMD6+NlO4atYAzDMHohniccPrQvj104wRItDcMwjPLiecLgvjWV23/F9mwYhmH0akzBGIZhGBXBFIxhGIZREUzBGIZhGBWhYgpGRA4XkVdyHp+JyPdEZICILBaRVcFzQ86cK0RktYisFJGTc8abROT14LObRUSC8RoRWRCMvygiw3PmTA+OsUpEplfqexqGYRiFqZiCUdWVqnqMqh4DNAE7gMeAy4ElqjoKWBK8R0RGA1OBI4FTgNtEJJvxMxeYCYwKHqcE4+cBm1V1JHAjcG2wrwHAbOBLwDhgdq4iMwzDMCqPKxPZScCfVPU9YBJwbzB+L/CN4PUk4GFVbVPVd4HVwDgRORDop6rPq6oC93Wak93XIuCkYHVzMrBYVTep6mZgMbuUkmEYhuEAVwpmKvBQ8Hqoqn4MEDwPCcaHAR/kzFkbjA0LXncez5ujqingU2BgN/vKQ0RmikiLiLRs2LBhr7+cYRiG0ZWKKxgRiQOnAY/sbtMCY9rN+N7O2TWgOk9Vm1W1efDgwbsRzzAMw9gTXKxg/hZ4WVXXBe/XBWYvguf1wfha4OCceY3AR8F4Y4HxvDkiEgUOADZ1sy/DMAzDES4UzN+zyzwG8ASQjeqaDjyeMz41iAwbQcaZ/1JgRtsqIuMD/8q5neZk9zUZWBr4aZ4BJopIQ+DcnxiMGYZhGI6oaC0yEakDvgrMyhm+BlgoIucB7wNnAqjqmyKyEFgBpICLVDXbu/MCYD6QAJ4KHgB3AfeLyGoyK5epwb42icgcYFmw3U9UdVNFvqRhGIZREMnc8BvNzc3a0tJSbTEMwzD2K0Rkuao2F/rMMvkNwzCMimAKxjAMw6gIpmAMwzCMimAKxjAMw6gIpmAMwzCMimAKxjAMw6gIpmAMwzCMimAKxjAMw6gIpmAMwzCMimAKxjAMw6gIpmAMwzCMilDRYpeGYRhGePF9ZeP2JMlUmng0wsD6OJ5XqJ3W3mEKxjAMoxfi+8rKdVs5/74W1m5upbEhwZ3nNnP40L5lUzJmIjMMw+iFbNye7FAuAGs3t3L+fS1s3J4s2zFsBWMYhuGYSpumSiGZSncolyxrN7eSTKWLzNhzTMEYhmE4xIVpqhRiUY/GhkSekmlsSBCLls+wZSYywzAMh7gwTZVCxIPrJ4+hsSEBZJTL9ZPHECmjVqioghGR/iKySETeEpE/isixIjJARBaLyKrguSFn+ytEZLWIrBSRk3PGm0Tk9eCzm0VEgvEaEVkQjL8oIsNz5kwPjrFKRKZX8nsahmGUigvTVClsb0tz3dMrufLU0SyYOZ4rTx3NdU+vZHubX7ZjVNpEdhPwtKpOFpE4UAf8CFiiqteIyOXA5cBlIjIamAocCRwE/FpEDlPVNDAXmAm8APwKOAV4CjgP2KyqI0VkKnAtMEVEBgCzgWZAgeUi8oSqbq7w9zUMw+iWeDRS0DQVj0acyhERYcO2NmbdvzxPjkgZrXQVW8GISD/gy8BdAKqaVNUtwCTg3mCze4FvBK8nAQ+rapuqvgusBsaJyIFAP1V9XlUVuK/TnOy+FgEnBaubk4HFqropUCqLySglwzCMqjKwPs6d5zbnmabuPLeZgfVxp3Ik4pGCJrJEvHyKrpIrmEOBDcA9InI0sBy4BBiqqh8DqOrHIjIk2H4YmRVKlrXBWHvwuvN4ds4Hwb5SIvIpMDB3vMAcwzCMquF5wqjBfVg461ja0z6xiMeQPjXOo8j6J+IM7VfLnElHURePsCOZZmi/WvonyqfoKqlgosBY4GJVfVFEbiJjDitGobOr3Yzv7ZxdBxSZScb0xiGHHNKNaIZhGOXB95VVG7ZVPYrM84ThA+vpWxurWLh0JZ38a4G1qvpi8H4RGYWzLjB7ETyvz9n+4Jz5jcBHwXhjgfG8OSISBQ4ANnWzrzxUdZ6qNqtq8+DBg/fyaxqGYZTOxu1Jblyc71y/cfFK51FkkFEyg/vWMKyhjsF9y7+KqpiCUdU/Ax+IyOHB0EnACuAJIBvVNR14PHj9BDA1iAwbAYwCXgrMaVtFZHzgXzm305zsviYDSwM/zTPARBFpCKLUJgZjhmEYVcX3faYfN4I5T65gyrwXmPPkCqYfNwLfL1/0VliodBTZxcCDQQTZO8AMMkptoYicB7wPnAmgqm+KyEIySigFXBREkAFcAMwHEmSix54Kxu8C7heR1WRWLlODfW0SkTnAsmC7n6jqpkp+UcMwjFJI+cplj76Wlwdz2aOvsWDm+CpLVn4qqmBU9RUyocKdOanI9lcDVxcYbwGOKjC+k0BBFfjsbuDuPRDXMAyj4qR9LZgHk/a7uIn3eyyT3zAMwyHRiNcRGpylsSFBtJwp9CGh530jwzCMEDOkTw23T2vKyz+5fVoTQ/rUVFmy8mPFLg3DMBwSjXocMbQvC2cdSyrtEw3yYKJlLDIZFkzBGIZhOMbzhFjEQ1WJRTznSZauMAVjGIbhkLCU6wdIpXzWb2vLqyhQzpVUz1uTGYZhhJiwlOtPpXzeWreVs+54nq9c/xxn3fE8b63bSiq1/1RTNgzDMHIIS7n+9dvauHnJ21x56mj6J2JsaW3n5iVvc9VpR3FQ/8Tud1ACpmAMwzAcEpZy/aBMP25ER9JnY0OCa88Yg3Qt27jXmInMMIxeg+8rG7a28eHmHWzY2oZfheTGsJTrV6VgRYFynhJbwRiG0SsIi3Pd84TDh/blsQsnVKyKcSkoMLhPTZ6J7Pbn/lTWY5iCMQyjV1DMuf7YhRMY3NdtkmO2inE1qYtH+OEph/ODRbtMZOVuOGYmMsMwegVhca6HhZSvHcoFMufiB4teI1VGG5kpGMMwegVZ53ou1XGuh4P2lF9Q4baXMUzZFIxhGL2CsDjXw4ILhSuZ/lxGc3OztrS0VFsMwzAqiO8rG7cnq+pcDwvlCnoQkeWqWqgtizn5DcPoPYTBuR4WXESzmYIxDMPopVRa4ZoPxjAMw6gIFVUwIrJGRF4XkVdEpCUYGyAii0VkVfDckLP9FSKyWkRWisjJOeNNwX5Wi8jNIiLBeI2ILAjGXxSR4TlzpgfHWCUi0yv5PQ3DMPaEVMrnoy2tvLdxOx9taS1rgck9odKVDVysYE5Q1WNynECXA0tUdRSwJHiPiIwGpgJHAqcAt4lINpxhLjATGBU8TgnGzwM2q+pI4Ebg2mBfA4DZwJeAccDsXEVmGIZRLVxUMS6FrJP/9Nt+z4Rrf8Ppt/2eleu2llXJVMNENgm4N3h
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x='Overall Qual',y='SalePrice',data=df)"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PID</th>\n",
" <th>MS SubClass</th>\n",
" <th>MS Zoning</th>\n",
" <th>Lot Frontage</th>\n",
" <th>Lot Area</th>\n",
" <th>Street</th>\n",
" <th>Alley</th>\n",
" <th>Lot Shape</th>\n",
" <th>Land Contour</th>\n",
" <th>Utilities</th>\n",
" <th>...</th>\n",
" <th>Pool Area</th>\n",
" <th>Pool QC</th>\n",
" <th>Fence</th>\n",
" <th>Misc Feature</th>\n",
" <th>Misc Val</th>\n",
" <th>Mo Sold</th>\n",
" <th>Yr Sold</th>\n",
" <th>Sale Type</th>\n",
" <th>Sale Condition</th>\n",
" <th>SalePrice</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1182</th>\n",
" <td>533350090</td>\n",
" <td>60</td>\n",
" <td>RL</td>\n",
" <td>NaN</td>\n",
" <td>24572</td>\n",
" <td>Pave</td>\n",
" <td>NaN</td>\n",
" <td>IR1</td>\n",
" <td>Lvl</td>\n",
" <td>AllPub</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>6</td>\n",
" <td>2008</td>\n",
" <td>WD</td>\n",
" <td>Family</td>\n",
" <td>150000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1498</th>\n",
" <td>908154235</td>\n",
" <td>60</td>\n",
" <td>RL</td>\n",
" <td>313.0</td>\n",
" <td>63887</td>\n",
" <td>Pave</td>\n",
" <td>NaN</td>\n",
" <td>IR3</td>\n",
" <td>Bnk</td>\n",
" <td>AllPub</td>\n",
" <td>...</td>\n",
" <td>480</td>\n",
" <td>Gd</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>2008</td>\n",
" <td>New</td>\n",
" <td>Partial</td>\n",
" <td>160000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2180</th>\n",
" <td>908154195</td>\n",
" <td>20</td>\n",
" <td>RL</td>\n",
" <td>128.0</td>\n",
" <td>39290</td>\n",
" <td>Pave</td>\n",
" <td>NaN</td>\n",
" <td>IR1</td>\n",
" <td>Bnk</td>\n",
" <td>AllPub</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Elev</td>\n",
" <td>17000</td>\n",
" <td>10</td>\n",
" <td>2007</td>\n",
" <td>New</td>\n",
" <td>Partial</td>\n",
" <td>183850</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2181</th>\n",
" <td>908154205</td>\n",
" <td>60</td>\n",
" <td>RL</td>\n",
" <td>130.0</td>\n",
" <td>40094</td>\n",
" <td>Pave</td>\n",
" <td>NaN</td>\n",
" <td>IR1</td>\n",
" <td>Bnk</td>\n",
" <td>AllPub</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>10</td>\n",
" <td>2007</td>\n",
" <td>New</td>\n",
" <td>Partial</td>\n",
" <td>184750</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>4 rows × 81 columns</p>\n",
"</div>"
],
"text/plain": [
" PID MS SubClass MS Zoning Lot Frontage Lot Area Street Alley \\\n",
"1182 533350090 60 RL NaN 24572 Pave NaN \n",
"1498 908154235 60 RL 313.0 63887 Pave NaN \n",
"2180 908154195 20 RL 128.0 39290 Pave NaN \n",
"2181 908154205 60 RL 130.0 40094 Pave NaN \n",
"\n",
" Lot Shape Land Contour Utilities ... Pool Area Pool QC Fence \\\n",
"1182 IR1 Lvl AllPub ... 0 NaN NaN \n",
"1498 IR3 Bnk AllPub ... 480 Gd NaN \n",
"2180 IR1 Bnk AllPub ... 0 NaN NaN \n",
"2181 IR1 Bnk AllPub ... 0 NaN NaN \n",
"\n",
" Misc Feature Misc Val Mo Sold Yr Sold Sale Type Sale Condition \\\n",
"1182 NaN 0 6 2008 WD Family \n",
"1498 NaN 0 1 2008 New Partial \n",
"2180 Elev 17000 10 2007 New Partial \n",
"2181 NaN 0 10 2007 New Partial \n",
"\n",
" SalePrice \n",
"1182 150000 \n",
"1498 160000 \n",
"2180 183850 \n",
"2181 184750 \n",
"\n",
"[4 rows x 81 columns]"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[(df['Overall Qual']>8) & (df['SalePrice']<200000)]"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:xlabel='Gr Liv Area', ylabel='SalePrice'>"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZgAAAEGCAYAAABYV4NmAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy86wFpkAAAACXBIWXMAAAsTAAALEwEAmpwYAABsYUlEQVR4nO2deXhU5b34P99ZM9lIgASBREFENFIQgoDQ60aLWmmpsrgA4gqI1t7WtddS21J7ReR6SxVBq4iKCoJera1Vi1J/BXFBFBVBRNREkcSQQNZZ398fZ2EmcyYEzEAI7+d55snMO+c95z0hvN/z3UUphUaj0Wg0bY3rUC9Ao9FoNB0TLWA0Go1Gkxa0gNFoNBpNWtACRqPRaDRpQQsYjUaj0aQFz6FeQHuha9euqlevXod6GRqNRnNYsX79+m+VUgVO32kBY9KrVy/eeeedQ70MjUajOawQkS9SfadNZBqNRqNJC1rAaDQajSYtaAGj0Wg0mrSgBYxGo9Fo0oIWMBqNRqNJCzqKTKPRdBhiMUVVfYhQJIrP46ZLlg+XSw71so5YtIDRaDQdglhMsWVnLVc/+g7l1Y0U5Qd48NIh9OuWo4XMIUKbyDQaTYegqj5kCxeA8upGrn70HarqQ4d4ZUcuWoPRaDQdglAkagsXi/LqRkKRaMKYNqMdPLSA0Wg0HQKfx01RfiBByBTlB/B53PZnbUY7uGgTmUaj6RB0yfLx4KVDKMoPANjCo0uWzz5Gm9EOLlqD0Wg0HQKXS+jXLYdnZ45Maf5qrRlN0zZoAaPRaDoMLpdQkONP+X1rzGiatkObyDQazRFDa8xomrZDazAajeaIoTVmNE3boQWMRqM5otiXGU3TdmgTmUaj0WjSghYwGo1Go0kLWsBoNBqNJi1oAaPRaDSatJA2ASMi/UTkvbjXHhH5TxHpLCKviMhW82d+3JxficinIrJFRM6OGy8VkQ/M7+aLiJjjfhFZZo6/KSK94uZMNa+xVUSmpus+NRqNRuNM2gSMUmqLUupkpdTJQCnQADwL3AqsUkr1BVaZnxGREuAi4CTgHGCBiFjZT/cD04C+5uscc/xKoFopdRxwDzDHPFdn4HZgGDAUuD1ekGk0Go0m/RwsE9koYJtS6gtgLLDEHF8C/NR8PxZ4SikVVEptBz4FhopIdyBXKfWGUkoBjzabY51rBTDK1G7OBl5RSu1SSlUDr7BXKGk0Go3mIHCwBMxFwJPm+25KqR0A5s9Cc7wnUBY3p9wc62m+bz6eMEcpFQF2A11aOFcCIjJNRN4RkXcqKysP+OY0Go1Gk0zaBYyI+ICfAE/v61CHMdXC+IHO2Tug1ANKqSFKqSEFBQX7WJ5Go9Fo9oeDocGcC7yrlNppft5pmr0wf1aY4+VAcdy8IuBrc7zIYTxhjoh4gE7ArhbOpdFoNJqDxMEQMBez1zwG8DxgRXVNBZ6LG7/IjAzrjeHMf8s0o9WKyHDTv3JpsznWucYDr5p+mpeA0SKSbzr3R5tjGo1GozlIpLUWmYhkAj8EpscN3wksF5ErgS+BCQBKqY9EZDmwCYgA1yqlrCYN1wCPAAHgRfMF8BDwmIh8iqG5XGSea5eIzAbeNo/7vVJqV1puUqPRaDSOiPHArxkyZIh65513DvUyNBqN5rBCRNYrpYY4facz+TUajUaTFrSA0Wg0Gk1a0AJGo9FoNGlBCxiNRqPRpAUtYDQajUaTFrSA0Wg0Gk1a0AJGo9FoNGlBCxiNRqPRpAUtYDQajUaTFrSA0Wg0Gk1a0AJGo9FoNGkhrcUuNRpNxyIWU1TVhwhFovg8brpk+XC5nNovaTRawGg0mlYSiym27Kzl6kffoby6kaL8AA9eOoR+3XK0kNE4ok1kGo2mVVTVh2zhAlBe3cjVj75DVX3oEK9M017RGoxGE0cqE5A2DUEoErWFi0V5dSOhSDTFDM2RjhYwGo1JKhNQ34JstlbW7ZdpqCMKJJ/HTVF+IEHIFOUH8Hnch3BVmvaMNpFpNCapTEAVdcH9Mg1Zgur8BWsYOec1zl+whi07a4nFDu/mfl2yfDx46RCK8gMAtqDtkuU7xCvTtFfSKmBEJE9EVojIZhH5WEROFZHOIvKKiGw1f+bHHf8rEflURLaIyNlx46Ui8oH53XwREXPcLyLLzPE3RaRX3Jyp5jW2isjUdN6npmOQygQUjsb2yzTUUX0VLpfQr1sOz84cyZpbzuTZmSO1g1/TIunWYP4E/EMpdQIwEPgYuBVYpZTqC6wyPyMiJcBFwEnAOcACEbF07/uBaUBf83WOOX4lUK2UOg64B5hjnqszcDswDBgK3B4vyDQaJywTUDxF+QG8bpfjeCrTUEf2VbhcQkGOn575mRTk+LVw0bRI2gSMiOQCpwEPASilQkqpGmAssMQ8bAnwU/P9WOAppVRQKbUd+BQYKiLdgVyl1BtKKQU82myOda4VwChTuzkbeEUptUspVQ28wl6hpNE4ksoEVJjt3y/TUCpBpX0VmiONdDr5jwUqgcUiMhBYD/wc6KaU2gGglNohIoXm8T2BdXHzy82xsPm++bg1p8w8V0REdgNd4scd5mg0KemS7eOJq4fhFiHgc5MX8CWYhlrjtLcEVfOgAO2r0BxppFPAeIDBwM+UUm+KyJ8wzWEpcPrfqloYP9A5ey8oMg3D9MbRRx/dwtI0HZ1UEWR5AUMoWKah1rC/Akmj6aik0wdTDpQrpd40P6/AEDg7TbMX5s+KuOOL4+YXAV+b40UO4wlzRMQDdAJ2tXCuBJRSDyilhiilhhQUFBzgbWo6Am3tmN+XryIWU1TWBvmquoHK2mC7ijBrz2vTHF6kTcAopb4BykSknzk0CtgEPA9YUV1TgefM988DF5mRYb0xnPlvmea0WhEZbvpXLm02xzrXeOBV00/zEjBaRPJN5/5oc0yjceRgOubbcxhze16b5vAj3VFkPwOWishG4GTgj8CdwA9FZCvwQ/MzSqmPgOUYQugfwLVKKet/9zXAXzAc/9uAF83xh4AuIvIp8EtME5xSahcwG3jbfP3eHNNoHDmYjvn2HMbcntemOfxIaya/Uuo9YIjDV6NSHH8HcIfD+DtAf4fxJmBCinM9DDy8H8vVHMEcTMd8ew5jbs9r0xx+6FIxGg0H1zHfnkuutOe1aQ4/dKkYjcbkYCURtueSK+15bZrDDzF84pohQ4aod95551AvQ3OE0J6LYbbntWnaHyKyXinl5ArRJjKN5lAQn1fT3jb0/cn50WhaQgsYjcYBp00faHNBoLtEajoyWsBoNM1w2vQfvWIowUiszQVBqrDgZ2eO1FqE5rBHO/k1mmY4bfpfVDWkJT/kUIYF64x9TbrRGoxG0wynTT/T506LIDhUYcHaNKc5GGgNRqNphlNWf0MompZM/4MRFuykqeiMfc3BQGswGk0znLL6j+mSud+Z/q2JDkt3gmcqTSU3w6Mz9jVpRwsYjaYZqTZ9oNWCYH9MUOkMC06lqSyffqrO2NekHW0i02gccMrq359M//ZigkoVROAWdMa+Ju1oDUajSQPtpWhkqiACl8ulm6Jp0o7WYDSaNHAwy/+3REtBBAer9prmyEXXIjPRtcg0bUl7CgNub6VoNB0LXYtMoznIHMzy/61Zi64KoDkUaAGj0aQJvbFrjnS0D0aj0Wg0aSGtAkZEPheRD0TkPRF5xxzrLCKviMhW82d+3PG/EpFPRWSLiJwdN15qnudTEZkvImKO+0VkmTn+poj0ipsz1bzGVhGZms771BxexGe276oPUlHblJZ6XLrWl+ZI52CYyM5USn0b9/lWYJVS6k4RudX8fIuIlAAXAScBPYB/isjxSqkocD8wDVgH/B04B3gRuBKoVkodJyIXAXOAC0WkM3A7MARQwHoReV4pVX0Q7lfTjol3vhdk+7n5nH7ctGJjmzvi25OTX6M5VBwKE9lYYIn5fgnw07jxp5RSQaXUduBTYKiIdAdylVJvKCPk7dFmc6xzrQBGmdrN2cArSqldplB5BUMoaY5w4hMgZ5zRxxYu0HbJkLGY4ps9TdQHI8waU8Kg4jxd60tzRJJuDUYBL4uIAhYppR4Auim
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x='Gr Liv Area',y='SalePrice',data=df)"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PID</th>\n",
" <th>MS SubClass</th>\n",
" <th>MS Zoning</th>\n",
" <th>Lot Frontage</th>\n",
" <th>Lot Area</th>\n",
" <th>Street</th>\n",
" <th>Alley</th>\n",
" <th>Lot Shape</th>\n",
" <th>Land Contour</th>\n",
" <th>Utilities</th>\n",
" <th>...</th>\n",
" <th>Pool Area</th>\n",
" <th>Pool QC</th>\n",
" <th>Fence</th>\n",
" <th>Misc Feature</th>\n",
" <th>Misc Val</th>\n",
" <th>Mo Sold</th>\n",
" <th>Yr Sold</th>\n",
" <th>Sale Type</th>\n",
" <th>Sale Condition</th>\n",
" <th>SalePrice</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1498</th>\n",
" <td>908154235</td>\n",
" <td>60</td>\n",
" <td>RL</td>\n",
" <td>313.0</td>\n",
" <td>63887</td>\n",
" <td>Pave</td>\n",
" <td>NaN</td>\n",
" <td>IR3</td>\n",
" <td>Bnk</td>\n",
" <td>AllPub</td>\n",
" <td>...</td>\n",
" <td>480</td>\n",
" <td>Gd</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>2008</td>\n",
" <td>New</td>\n",
" <td>Partial</td>\n",
" <td>160000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2180</th>\n",
" <td>908154195</td>\n",
" <td>20</td>\n",
" <td>RL</td>\n",
" <td>128.0</td>\n",
" <td>39290</td>\n",
" <td>Pave</td>\n",
" <td>NaN</td>\n",
" <td>IR1</td>\n",
" <td>Bnk</td>\n",
" <td>AllPub</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>Elev</td>\n",
" <td>17000</td>\n",
" <td>10</td>\n",
" <td>2007</td>\n",
" <td>New</td>\n",
" <td>Partial</td>\n",
" <td>183850</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2181</th>\n",
" <td>908154205</td>\n",
" <td>60</td>\n",
" <td>RL</td>\n",
" <td>130.0</td>\n",
" <td>40094</td>\n",
" <td>Pave</td>\n",
" <td>NaN</td>\n",
" <td>IR1</td>\n",
" <td>Bnk</td>\n",
" <td>AllPub</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0</td>\n",
" <td>10</td>\n",
" <td>2007</td>\n",
" <td>New</td>\n",
" <td>Partial</td>\n",
" <td>184750</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>3 rows × 81 columns</p>\n",
"</div>"
],
"text/plain": [
" PID MS SubClass MS Zoning Lot Frontage Lot Area Street Alley \\\n",
"1498 908154235 60 RL 313.0 63887 Pave NaN \n",
"2180 908154195 20 RL 128.0 39290 Pave NaN \n",
"2181 908154205 60 RL 130.0 40094 Pave NaN \n",
"\n",
" Lot Shape Land Contour Utilities ... Pool Area Pool QC Fence \\\n",
"1498 IR3 Bnk AllPub ... 480 Gd NaN \n",
"2180 IR1 Bnk AllPub ... 0 NaN NaN \n",
"2181 IR1 Bnk AllPub ... 0 NaN NaN \n",
"\n",
" Misc Feature Misc Val Mo Sold Yr Sold Sale Type Sale Condition \\\n",
"1498 NaN 0 1 2008 New Partial \n",
"2180 Elev 17000 10 2007 New Partial \n",
"2181 NaN 0 10 2007 New Partial \n",
"\n",
" SalePrice \n",
"1498 160000 \n",
"2180 183850 \n",
"2181 184750 \n",
"\n",
"[3 rows x 81 columns]"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[(df['Gr Liv Area']>4000) & (df['SalePrice']<400000)]"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Int64Index([1498, 2180, 2181], dtype='int64')"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[(df['Gr Liv Area']>4000) & (df['SalePrice']<400000)].index"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [],
"source": [
"ind_drop = df[(df['Gr Liv Area']>4000) & (df['SalePrice']<400000)].index"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [],
"source": [
"df = df.drop(ind_drop,axis=0)"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:xlabel='Gr Liv Area', ylabel='SalePrice'>"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZgAAAEGCAYAAABYV4NmAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy86wFpkAAAACXBIWXMAAAsTAAALEwEAmpwYAAB2l0lEQVR4nO2deXgUVbq436/XdBZICAmCCYKIaEQQgizqT1FmUK+MjLKoLKKigOuMoyjeGa6OXOeCyPWO4wJuiIgKol4dxg1xHO8gbpEBNYqoqESBhJBAlk6v5/dHVxXd6eokLIEA532ePOmcrlN1qtDz1beLUgqNRqPRaPY3joO9AI1Go9EcnmgBo9FoNJpWQQsYjUaj0bQKWsBoNBqNplXQAkaj0Wg0rYLrYC+grdCxY0fVrVu3g70MjUajOaQoKSnZrpTKs/tOCxiDbt268cknnxzsZWg0Gs0hhYj8kOo7bSLTaDQaTaugBYxGo9FoWgUtYDQajUbTKmgBo9FoNJpWQQsYjUaj0bQKOopMo9FoDhOiUUVlXZBgOILH5SQ3w4PDIQdtPVrAaDQazWFANKrYsK2Ga57+hLIqPwU5Ph67fAC9OmUdNCGjTWQajUZzGFBZF7SEC0BZlZ9rnv6EyrrgQVuT1mA0Go3mMCAYjljCxaSsyk8wHEk5p7VNalrAaDQazWGAx+WkIMeXIGQKcnx4XE7b4w+ESU2byDQajeYwIDfDw2OXD6AgxwdgCYzcDI/t8QfCpKY1GI1GozkMcDiEXp2yePm601tk8tobk9qeogWMRqPRHCY4HEJelrdFx+6pSW2v1rPfzqTRaDSaQ4Y9NantDVqD0Wg0miOQPTWp7Q1awGg0Gs0Ryp6Y1Pbq/K12Zo1Go9Ec0WgBo9FoNJpWQQsYjUaj0bQKWsBoNBqNplVoNQEjIr1E5F9xP7tE5Lci0kFEVorIRuN3TtycO0TkGxHZICLnxo0Xi8hnxncPiIgY414RWWqMfygi3eLmTDKusVFEJrXWfWo0Go3GnlYTMEqpDUqpU5RSpwDFQD3wMjADWKWU6gmsMv5GRIqAS4GTgPOAh0XEzPh5BJgC9DR+zjPGJwNVSqnjgPuBOca5OgB3AoOAgcCd8YJMo9FoNK3PgTKRDQO+VUr9AIwEFhnji4BfG59HAs8rpQJKqU3AN8BAEekMtFNKrVFKKeDpRnPMcy0HhhnazbnASqXUDqVUFbCS3UJJo9FoNAeAAyVgLgWeMz53UkptATB+5xvjRwOb4+aUGWNHG58bjyfMUUqFgZ1AbhPnSkBEpojIJyLySUVFxV7fnEaj0WiSaXUBIyIe4ELgheYOtRlTTYzv7ZzdA0o9qpQaoJQakJeX18zyNBqNRrMnHAgN5nzgU6XUNuPvbYbZC+N3uTFeBhTGzSsAfjbGC2zGE+aIiAtoD+xo4lwajUajOUAcCAFzGbvNYwCvAmZU1yTglbjxS43IsO7EnPkfGWa0GhEZbPhXLm80xzzXaOAdw0/zJjBcRHIM5/5wY0yj0Wg0B4hWrUUmIunAL4GpccOzgWUiMhn4ERgDoJT6QkSWAaVAGLheKWU2JrgWeArwAa8bPwBPAItF5Btimsulxrl2iMgs4GPjuLuVUjta5SY1Go1GY4vEXvg1AwYMUJ988snBXoZGo9EcUohIiVJqgN13OpNfo9FoNK2CFjAajUajaRW0gNFoNBpNq6AFjEaj0WhaBS1gNBqNRtMqaAGj0Wg0mlZBCxiNRqPRtApawGg0Go2mVdACRqPRaDStghYwGo1Go2kVtIDRaDQaTavQqsUuNRqN5lAgGlVU1gUJhiN4XE5yMzw4HHZtpTR7ghYwGo3miCYaVWzYVsM1T39CWZWfghwfj10+gF6dsrSQ2Ue0iUyj0RzRVNYFLeECUFbl55qnP6GyLniQV3boozUYjaaN0Zy5Rptz9i/BcMQSLiZlVX6C4UiKGZqWogWMRtOGaM5cs6/mHC2ckvG4nBTk+BKETEGOD4/LeRBXdXigTWQaTRuiOXPNvphzTOF00cOrOX3O37no4dVs2FZDNHpkNx3MzfDw2OUDKMjxAVhCOzfDc5BXdujTqgJGRLJFZLmIfCUiX4rIEBHpICIrRWSj8Tsn7vg7ROQbEdkgIufGjReLyGfGdw+IiBjjXhFZaox/KCLd4uZMMq6xUUQmteZ9ajT7i+bMNftiztG+BnscDqFXpyxevu50Vt9+Ni9fd7p28O8nWluD+TPwhlLqBKAv8CUwA1illOoJrDL+RkSKgEuBk4DzgIdFxNRRHwGmAD2Nn/OM8clAlVLqOOB+YI5xrg7AncAgYCBwZ7wg02jaKqa5Jp54c01z3zeF9jWkxuEQ8rK8HJ2TTl6WVwuX/USrCRgRaQecCTwBoJQKKqWqgZHAIuOwRcCvjc8jgeeVUgGl1CbgG2CgiHQG2iml1iilFPB0oznmuZYDwwzt5lxgpVJqh1KqCljJbqGk0bRZmjPX7Is5Z1+Ek0azN7Smk/9YoAJYKCJ9gRLgN0AnpdQWAKXUFhHJN44/Gvggbn6ZMRYyPjceN+dsNs4VFpGdQG78uM0cjaZNYjrg26W5WDZ1CE4Bh8OR4IiPN+fsqaPeFE6NAwS0r0HTWrSmgHEB/YEblVIfisifMcxhKbD7P0Q1Mb63c3ZfUGQKMdMbXbt2bWJpGk3rsifRYaY5Z0/ZF+Gk0ewNremDKQPKlFIfGn8vJyZwthlmL4zf5XHHF8bNLwB+NsYLbMYT5oiIC2gP7GjiXAkopR5VSg1QSg3Iy8vby9vUaPadA+WAb6mvIRpVVNQE+KmqnoqawCEbaXa43MehSqsJGKXUVmCziPQyhoYBpcCrgBnVNQl4xfj8KnCpERnWnZgz/yPDnFYjIoMN/8rljeaY5xoNvGP4ad4EhotIjuHcH26MaTRtkrbkgD9cwpkPl/s4lGntKLIbgSUish44BfgTMBv4pYhsBH5p/I1S6gtgGTEh9AZwvVLK/L/rWuBxYo7/b4HXjfEngFwR+Qb4HYYJTim1A5gFfGz83G2MaTRtkrbkgD9cwpkPl/s4lGnVTH6l1L+AATZfDUtx/D3APTbjnwC9bcYbgDEpzvUk8OQeLFejOWi0JQd8W9Km9oXD5T4OZXSpGI2mDdCWHPCHS+mUw+U+DmV0qRiNpo3QVpL9DpfSKYfLfRzKSMwnrhkwYID65JNPDvYyNJo2weFSFPNwuY+2jIiUKKXsXCHaRKbRaJJpnGtjhvseahv13uYMafYPWsBoNG2YVG/gB/LNXHd81OwtWsBoNG2UVBt7z7xMNlbUHrANP1W478vXna61A02TaCe/RtNGSbWxl9cGDmh+R1sO99WZ+m0brcFoNG2UVBt7KBI9oBt+Ww331aa7to/WYDSaNkqq7H6303FAs/4PZrhvUxqKztRv+2gNRqNpo6TK7s/P9O5T1v+eBggcrCTQ5jSUtmy608TQAkajaaM0tbHv7Ya/t2algxHu21xwQVs13Wl2o01kGk0bJlV2/95m/R9KZqXmNBSdqd/20RqMRnMEcSiZlZrTUNpS/TaNPVqD0WiOINpSW4DmaImG0lbqt2ns0bXIDHQtMs2RwKEW2qtribV9dC0yjUYDHHpmJV1L7NBGCxiN5ghDb9qaA4X2wWg0Go2mVWhVASMi34vIZyLyLxH5xBjrICIrRWSj8Tsn7vg7ROQbEdkgIufGjRcb5/lGRB4QETHGvSKy1Bj/UES6xc2ZZFxjo4hMas371GhaQuOs9HA4elDqaOn6XZoDxYEwkZ2tlNoe9/cMYJVSaraIzDD+vl1EioBLgZOALsDbInK8UioCPAJMAT4AXgPOA14HJgNVSqnjRORSYA5wiYh0AO4EBgAKKBGRV5VSVQfgfjWaJBo714cX5XPTsOOZ9kzJAXW2H2pOfs2hzcEwkY0EFhmfFwG/jht/XikVUEptAr4BBopIZ6CdUmqNioW8Pd1ojnmu5cAwQ7s5F1iplNphCJWVxISSRnNQaJzgOKq40BIu0PoJj6bWUlZdz9adDeRleg/IdTVHNq2twSjgLRFRwAKl1KNAJ6XUFgCl1BYRyTeOPZqYhmJ
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x='Gr Liv Area',y='SalePrice',data=df)"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:xlabel='Overall Qual', ylabel='SalePrice'>"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAZgAAAEGCAYAAABYV4NmAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy86wFpkAAAACXBIWXMAAAsTAAALEwEAmpwYAAA73klEQVR4nO2deZhU5Znof++ppbu6AWnWqzQGDKiDDhq6Q1BmjMtEnYmReEVhIkKII0SNMRknUXPHiwljHpdrjMuI4qi4RUGMV8cbFwJxMsm40DiuJAhRVNQAsihL09VV9d4/zqmmqruqKaHqq0P3+3uefqrqq/Od857T3ec937uKqmIYhmEY5cartgCGYRhGz8QUjGEYhlERTMEYhmEYFcEUjGEYhlERTMEYhmEYFSFabQHCwqBBg3TEiBHVFsMwDGO/YsWKFR+r6uBC35mCCRgxYgQtLS3VFsMwDGO/QkTeLfadmcgMwzCMimAKxjAMw6gIpmAMwzCMimAKxjAMw6gIpmAMwzCMimBRZIZhGL2UTEbZtCNJMpUmHo0wsD6O50nZ9m8KxjAMoxeSySir1m/j/PtaWLellcaGBHdOb+awoX3LpmTMRGYYhtEL2bQj2aFcANZtaeX8+1rYtCNZtmOYgjEMw+iFJFPpDuWSZd2WVpKpdNmOYQrGMAyjFxKLejQ2JPLGGhsSxKLlUwumYAzDMHohUU+4fvLYDiXT2JDg+sljiZqT3zAMw9gXWpNprnt6FVeeNob+iRhbW9u57ulV3PqNL0B9eY5hCsYwDKMXEo9G2Li9jdn3r+gYa2xIEI9GynYMM5EZhmH0QgbWx7lzenOeiezO6c0MrI+X7Ri2gjEMw+iFeJ5w2NC+PHbhREu0NAzDMMqL5wmD+9ZUbv8V27NhGIbRqzEFYxiGYVQEUzCGYRhGRTAFYxiGYVSEiikYETlMRF7J+flURL4nIgNEZImIrA5eG3LmXCEia0RklYickjPeJCKvB9/dLCISjNeIyMJg/EURGZEzZ0ZwjNUiMqNS52kYhmEUpmIKRlVXqerRqno00ATsBB4DLgeWqupoYGnwGREZA0wFjgBOBW4TkWzGzzxgFjA6+Dk1GD8P2KKqo4AbgWuDfQ0A5gBfAsYDc3IVmWEYhlF5XJnITgL+pKrvApOAe4Pxe4GvB+8nAQ+rapuqvgOsAcaLyIFAP1V9XlUVuK/TnOy+FgMnBaubU4AlqrpZVbcAS9itlAzDMAwHuFIwU4GHgvdDVfUjgOB1SDA+DHg/Z866YGxY8L7zeN4cVU0BnwADu9lXHiIyS0RaRKRl48aNe31yhmEYRlcqrmBEJA6cDjyyp00LjGk343s7Z/eA6nxVbVbV5sGDB+9BPMMwDOOz4GIF87fAy6q6Pvi8PjB7EbxuCMbXAcNz5jUCHwbjjQXG8+aISBQ4ANjczb4MwzAMR7hQMH/PbvMYwBNANqprBvB4zvjUIDJsJL4z/6XAjLZNRCYE/pXpneZk9zUZWBb4aZ4BThaRhsC5f3IwZhiGYTiiorXIRKQO+AowO2f4GmCRiJwHvAecBaCqb4rIImAlkAIuUtVs784LgAVAAngq+AG4C7hfRNbgr1ymBvvaLCJzgeXBdj9R1c0VOUnDMAyjIOI/8BvNzc3a0tJSbTEMwzD2K0Rkhao2F/rOMvkNwzCMimAKxjAMw6gIpmAMwzCMimAKxjAMw6gIpmAMwzCMimAKxjAMw6gIpmAMwzCMimAKxjAMw6gIpmAMwzCMimAKxjAMw6gIpmAMwzCMilDRYpeGYRhGeMlklE07kiRTaeLRCAPr43heoXZae4cpGMMwjF5IJqOsWr+N8+9rYd2WVhobEtw5vZnDhvYtm5IxE5lhGEYvZNOOZIdyAVi3pZXz72th045k2Y5hKxjDMAzHVNo0VQrJVLpDuWRZt6WVZCpdZMZnxxSMYRiGQ1yYpkohFvVobEjkKZnGhgSxaPkMW2YiMwzDcIgL01QpRDy4fvJYGhsSgK9crp88lkgZtUJFFYyI9BeRxSLyRxH5g4gcIyIDRGSJiKwOXhtytr9CRNaIyCoROSVnvElEXg++u1lEJBivEZGFwfiLIjIiZ86M4BirRWRGJc/TMAyjVFyYpkphR1ua655exZWnjWHhrAlcedoYrnt6FTvaMmU7RqVNZDcBT6vqZBGJA3XAj4ClqnqNiFwOXA5cJiJjgKnAEcBBwK9F5FBVTQPzgFnAC8CvgFOBp4DzgC2qOkpEpgLXAlNEZAAwB2gGFFghIk+o6pYKn69hGEa3xKORgqapeDTiVI6ICBu3tzH7/hV5ckTKaKWr2ApGRPoBxwF3AahqUlW3ApOAe4PN7gW+HryfBDysqm2q+g6wBhgvIgcC/VT1eVVV4L5Oc7L7WgycFKxuTgGWqOrmQKkswVdKhmEYVWVgfZw7pzfnmabunN7MwPq4UzkS8UhBE1kiXj5FV8kVzCHARuAeETkKWAFcAgxV1Y8AVPUjERkSbD8Mf4WSZV0w1h687zyenfN+sK+UiHwCDMwdLzDHMAyjanieMHpwHxbNPob2dIZYxGNInxrnUWT9E3GG9qtl7qQjqYtH2JlMM7RfLf0T5VN0lVQwUWAccLGqvigiN+Gbw4pR6OpqN+N7O2f3AUVm4ZveOPjgg7sRzTAMozxkMsrqjdurHkXmecKIgfX0rY1VLFy6kk7+dcA6VX0x+LwYX+GsD8xeBK8bcrYfnjO/EfgwGG8sMJ43R0SiwAHA5m72lYeqzlfVZlVtHjx48F6epmEYRuls2pHkxiX5zvUbl6xyHkUGvpIZ3LeGYQ11DO5b/lVUxRSMqv4ZeF9EDguGTgJWAk8A2aiuGcDjwfsngKlBZNhIYDTwUmBO2yYiEwL/yvROc7L7mgwsC/w0zwAni0hDEKV2cjBmGIZRVTKZDDOOHcncJ1cyZf4LzH1yJTOOHUkmU77orbBQ6Siyi4EHgwiyt4GZ+EptkYicB7wHnAWgqm+KyCJ8JZQCLgoiyAAuABYACfzosaeC8buA+0VkDf7KZWqwr80iMhdYHmz3E1XdXMkTNQzDKIVURrns0dfy8mAue/Q1Fs6aUGXJyk9FFYyqvoIfKtyZk4psfzVwdYHxFuDIAuO7CBRUge/uBu7+DOIahmFUnHRGC+bBpDNd3MT7PZbJbxiG4ZBoxOsIDc7S2JAgWs4U+pDQ887IMAwjxAzpU8Pt05ry8k9un9bEkD41VZas/FixS8MwDIdEox6HD+3LotnHkEpniAZ5MNEyFpkMC6ZgDMMwHON5QizioarEIp7zJEtXmIIxDMNwSFjK9QOkUhk2bG/LqyhQzpVUz1uTGYZhhJiwlOtPpTL8cf02zr7jeb58/XOcfcfz/HH9NlKp/aeasmEYhpFDWMr1b9jexs1L3+LK08bQPxFja2s7Ny99i6tOP5KD+if2vIMSMAVjGIbhkLCU6wdlxrEjO5I+GxsSXHvmWKRr2ca9xkxkhmH0GjIZZeO2Nj7YspON29rIVCG5MSzl+lUpWFGgnJfEVjCGYfQKwuJc9zzhsKF9eezCiRWrYlwKCgzuU5NnIrv9uT+V9RimYAzD6BUUc64/duFEBvd1m+SYrWJcTeriEX546mH8YPFuE1m5G46ZicwwjF5BWJzrYSGV0Q7lAv61+MHi10iV0UZmCsYwjF5B1rmeS3Wc6+GgPZUpqHDbyximbArGMIxeQVic62HBhcIVvz+X0dzcrC0tLdUWwzCMCpLJKJt2JKvqXA8L5Qp6EJEVqlqoLYs5+Q3D6D2EwbkeFlxEs5mCMQzD6KVUWuGaD8YwDMOoCBVVMCKyVkReF5FXRKQlGBsgIktEZHXw2pCz/RUiskZEVonIKTnjTcF+1ojIzSIiwXiNiCwMxl8UkRE5c2YEx1gtIjMqeZ6GYRifhVQqw4dbW3l30w4+3Npa1gKTn4VKVzZwsYI5QVWPznECXQ4sVdXRwNLgMyIyBpgKHAGcCtwmItlwhnnALGB08HNqMH4esEVVRwE3AtcG+xoAzAG+BIwH5uQqMsMwjGrhoopxKWSd/Gfc9nsmXvsbzrjt96xav62sSqYaJrJJwL3
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"sns.scatterplot(x='Overall Qual',y='SalePrice',data=df)"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [],
"source": [
"df.to_csv(\"../DATA/Ames_outliers_removed.csv\",index=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"----"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 1
}