___

Copyright by Pierian Data Inc. For more information, visit us at www.pieriandata.com

Multi-Class Logistic Regression¶

Students often ask how to perform non binary classification with Logistic Regression. Fortunately, the process with scikit-learn is pretty much the same as with binary classification. To expand our understanding, we'll go through a simple data set, as well as seeing how to use LogisiticRegression with a manual GridSearchCV (instead of LogisticRegressionCV). Make sure to watch the video to understand the One-vs-all process that is occuring "under the hood".

Imports¶

In [1]:

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

Data¶

We will work with the classic Iris Data Set. The Iris flower data set or Fisher's Iris data set is a multivariate data set introduced by the British statistician, eugenicist, and biologist Ronald Fisher in his 1936 paper The use of multiple measurements in taxonomic problems as an example of linear discriminant analysis.

Full Details: https://en.wikipedia.org/wiki/Iris_flower_data_set

In [3]:

df = pd.read_csv('../DATA/iris.csv')

In [4]:

df.head()

Out[4]:

	sepal_length	sepal_width	petal_length	petal_width	species
0	5.1	3.5	1.4	0.2	setosa
1	4.9	3.0	1.4	0.2	setosa
2	4.7	3.2	1.3	0.2	setosa
3	4.6	3.1	1.5	0.2	setosa
4	5.0	3.6	1.4	0.2	setosa

Exploratory Data Analysis and Visualization¶

Feel free to explore the data further on your own.

In [5]:

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  150 non-null    float64
 1   sepal_width   150 non-null    float64
 2   petal_length  150 non-null    float64
 3   petal_width   150 non-null    float64
 4   species       150 non-null    object 
dtypes: float64(4), object(1)
memory usage: 6.0+ KB

In [6]:

df.describe()

Out[6]:

	sepal_length	sepal_width	petal_length	petal_width
count	150.000000	150.000000	150.000000	150.000000
mean	5.843333	3.054000	3.758667	1.198667
std	0.828066	0.433594	1.764420	0.763161
min	4.300000	2.000000	1.000000	0.100000
25%	5.100000	2.800000	1.600000	0.300000
50%	5.800000	3.000000	4.350000	1.300000
75%	6.400000	3.300000	5.100000	1.800000
max	7.900000	4.400000	6.900000	2.500000

In [8]:

df['species'].value_counts()

Out[8]:

setosa        50
versicolor    50
virginica     50
Name: species, dtype: int64

In [9]:

sns.countplot(df['species'])

Out[9]:

<AxesSubplot:xlabel='species', ylabel='count'>

In [11]:

sns.scatterplot(x='sepal_length',y='sepal_width',data=df,hue='species')

Out[11]:

<AxesSubplot:xlabel='sepal_length', ylabel='sepal_width'>

In [12]:

sns.scatterplot(x='petal_length',y='petal_width',data=df,hue='species')

Out[12]:

<AxesSubplot:xlabel='petal_length', ylabel='petal_width'>

In [13]:

sns.pairplot(df,hue='species')

Out[13]:

<seaborn.axisgrid.PairGrid at 0x2a1a26a4908>

In [14]:

sns.heatmap(df.corr(),annot=True)

Out[14]:

<AxesSubplot:>

Easily discover new plot types with a google search! Searching for "3d matplotlib scatter plot" quickly takes you to: https://matplotlib.org/3.1.1/gallery/mplot3d/scatter3d.html

In [17]:

df['species'].unique()

Out[17]:

array(['setosa', 'versicolor', 'virginica'], dtype=object)

In [19]:

from mpl_toolkits.mplot3d import Axes3D 
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
colors = df['species'].map({'setosa':0, 'versicolor':1, 'virginica':2})
ax.scatter(df['sepal_width'],df['petal_width'],df['petal_length'],c=colors);

Train | Test Split and Scaling¶

In [32]:

X = df.drop('species',axis=1)
y = df['species']

In [33]:

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

In [77]:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=101)

In [78]:

scaler = StandardScaler()

In [79]:

scaled_X_train = scaler.fit_transform(X_train)
scaled_X_test = scaler.transform(X_test)