___

Copyright by Pierian Data Inc. For more information, visit us at www.pieriandata.com

Linear Regression Project Exercise - Solutions¶

Now that we have learned about feature engineering, cross validation, and grid search, let's test all your new skills with a project exercise in Machine Learning. This exercise will have a more guided approach, later on the ML projects will begin to be more open-ended. We'll start off with using the final version of the Ames Housing dataset we worked on through the feature engineering section of the course. Your goal will be to create a Linear Regression Model, train it on the data with the optimal parameters using a grid search, and then evaluate the model's capabilities on a test set.

Complete the tasks in bold¶

TASK: Run the cells under the Imports and Data section to make sure you have imported the correct general libraries as well as the correct datasets. Later on you may need to run further imports from scikit-learn.

Imports¶

In [ ]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Data¶

In [5]:

df = pd.read_csv("../DATA/AMES_Final_DF.csv")

In [6]:

df.head()

Out[6]:

	Lot Frontage	Lot Area	Overall Qual	Overall Cond	Year Built	Year Remod/Add	Mas Vnr Area	BsmtFin SF 1	BsmtFin SF 2	Bsmt Unf SF	...	Sale Type_WD	Sale Condition_Normal
0	141.0	31770	6	5	1960	1960	112.0	639.0	0.0	441.0	...	1	1
1	80.0	11622	5	6	1961	1961	0.0	468.0	144.0	270.0	...	1	1
2	81.0	14267	6	6	1958	1958	108.0	923.0	0.0	406.0	...	1	1
3	93.0	11160	7	5	1968	1968	0.0	1065.0	0.0	1045.0	...	1	1
4	74.0	13830	5	5	1997	1998	0.0	791.0	0.0	137.0	...	1	1

5 rows × 274 columns

In [7]:

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2925 entries, 0 to 2924
Columns: 274 entries, Lot Frontage to Sale Condition_Partial
dtypes: float64(11), int64(263)
memory usage: 6.1 MB

TASK: The label we are trying to predict is the SalePrice column. Separate out the data into X features and y labels

In [33]:

X = df.drop('SalePrice',axis=1)
y = df['SalePrice']

TASK: Use scikit-learn to split up X and y into a training set and test set. Since we will later be using a Grid Search strategy, set your test proportion to 10%. To get the same data split as the solutions notebook, you can specify random_state = 101

In [34]:

from sklearn.model_selection import train_test_split

In [35]:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.10, random_state=101)

TASK: The dataset features has a variety of scales and units. For optimal regression performance, scale the X features. Take carefuly note of what to use for .fit() vs what to use for .transform()

In [36]:

from sklearn.preprocessing import StandardScaler

In [37]:

scaler = StandardScaler()

In [38]:

scaled_X_train = scaler.fit_transform(X_train)
scaled_X_test = scaler.transform(X_test)

TASK: We will use an Elastic Net model. Create an instance of default ElasticNet model with scikit-learn

In [50]:

from sklearn.linear_model import ElasticNet

In [51]:

base_elastic_model = ElasticNet()

TASK: The Elastic Net model has two main parameters, alpha and the L1 ratio. Create a dictionary parameter grid of values for the ElasticNet. Feel free to play around with these values, keep in mind, you may not match up exactly with the solution choices

In [52]:

param_grid = {'alpha':[0.1,1,5,10,50,100],
              'l1_ratio':[.1, .5, .7, .9, .95, .99, 1]}

TASK: Using scikit-learn create a GridSearchCV object and run a grid search for the best parameters for your model based on your scaled training data. In case you are curious about the warnings you may recieve for certain parameter combinations

In [53]:

from sklearn.model_selection import GridSearchCV

In [42]:

# verbose number a personal preference
grid_model = GridSearchCV(estimator=base_elastic_model,
                          param_grid=param_grid,
                          scoring='neg_mean_squared_error',
                          cv=5,
                          verbose=1)

In [43]:

grid_model.fit(scaled_X_train,y_train)

Fitting 5 folds for each of 42 candidates, totalling 210 fits

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:475: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 323508201435.64636, tolerance: 135520669252.76785
  positive)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:475: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 336948879951.1281, tolerance: 130791380565.88454
  positive)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:475: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 357509968449.699, tolerance: 141505694000.6106
  positive)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:475: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 388683651672.28687, tolerance: 143819804008.8288
  positive)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:475: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 292809907400.36176, tolerance: 134568001825.51236
  positive)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:475: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 354244712120.8123, tolerance: 135520669252.76785
  positive)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:475: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 363490863623.8195, tolerance: 130791380565.88454
  positive)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:475: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 408539308800.79, tolerance: 141505694000.6106
  positive)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:475: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 407083927692.8737, tolerance: 143819804008.8288
  positive)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:475: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 355296428563.4605, tolerance: 134568001825.51236
  positive)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:475: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 349811975084.80225, tolerance: 135520669252.76785
  positive)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:475: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 359107413789.8297, tolerance: 130791380565.88454
  positive)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:475: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 406611694447.3531, tolerance: 141505694000.6106
  positive)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:475: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 402834204598.12994, tolerance: 143819804008.8288
  positive)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:475: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 368658319394.37177, tolerance: 134568001825.51236
  positive)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:475: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 302531521066.7632, tolerance: 135520669252.76785
  positive)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:475: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 312733820929.5414, tolerance: 130791380565.88454
  positive)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:475: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 347531367248.84485, tolerance: 141505694000.6106
  positive)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:475: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 352595740834.96906, tolerance: 143819804008.8288
  positive)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:475: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 255889875128.09943, tolerance: 134568001825.51236
  positive)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:475: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 170731423283.76883, tolerance: 135520669252.76785
  positive)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:475: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 186853289260.3824, tolerance: 130791380565.88454
  positive)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:475: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 196463479537.0528, tolerance: 141505694000.6106
  positive)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:475: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 212820336042.79935, tolerance: 143819804008.8288
  positive)
[Parallel(n_jobs=1)]: Done 210 out of 210 | elapsed:   19.3s finished

Out[43]:

GridSearchCV(cv=5, error_score='raise-deprecating',
             estimator=ElasticNet(alpha=1.0, copy_X=True, fit_intercept=True,
                                  l1_ratio=0.5, max_iter=1000, normalize=False,
                                  positive=False, precompute=False,
                                  random_state=None, selection='cyclic',
                                  tol=0.01, warm_start=False),
             iid='warn', n_jobs=None,
             param_grid={'alpha': [0.1, 1, 5, 10, 50, 100],
                         'l1_ratio': [0.1, 0.5, 0.7, 0.9, 0.95, 0.99, 1]},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring='neg_mean_squared_error', verbose=1)

TASK: Display the best combination of parameters for your model

In [54]:

grid_model.best_params_

Out[54]:

{'alpha': 100, 'l1_ratio': 1}

TASK: Evaluate your model's performance on the unseen 10% scaled test set. In the solutions notebook we achieved an MAE of $\$$14149 and a RMSE of $$$20532

In [45]:

y_pred = grid_model.predict(scaled_X_test)

In [46]:

from sklearn.metrics import mean_absolute_error,mean_squared_error

In [47]:

mean_absolute_error(y_test,y_pred)

Out[47]:

14149.055026374837

In [48]:

np.sqrt(mean_squared_error(y_test,y_pred))

Out[48]:

20532.890234901013

In [49]:

np.mean(df['SalePrice'])

Out[49]:

180815.53743589742

26 KiB

Raw Blame History Unescape Escape

Linear Regression Project Exercise - Solutions¶

Complete the tasks in bold¶

Imports¶

Data¶

Great work!¶

26 KiB Raw Blame History Unescape Escape

Linear Regression Project Exercise - Solutions¶

Complete the tasks in bold¶

Imports¶

Data¶

Great work!¶

26 KiB

Raw Blame History Unescape Escape