udemy-ML/18-Naive-Bayes-and-NLP/02-Text-Classification-Asse...

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "___\n",
    "\n",
    "<a href='http://www.pieriandata.com'> <img src='../Pierian_Data_Logo.png' /></a>\n",
    "___"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Text Classification Assessment \n",
    "\n",
    "### Goal: Given a set of text movie reviews that have been labeled negative or positive\n",
    "\n",
    "For more information on this dataset visit http://ai.stanford.edu/~amaas/data/sentiment/\n",
    "\n",
    "## Complete the tasks in bold below!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Task: Perform imports and load the dataset into a pandas DataFrame**\n",
    "For this exercise you can load the dataset from `'../DATA/moviereviews.csv'`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# CODE HERE"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_csv('../DATA/moviereviews.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>label</th>\n",
       "      <th>review</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>neg</td>\n",
       "      <td>how do films like mouse hunt get into theatres...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>neg</td>\n",
       "      <td>some talented actresses are blessed with a dem...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>pos</td>\n",
       "      <td>this has been an extraordinary year for austra...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>pos</td>\n",
       "      <td>according to hollywood movies made in last few...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>neg</td>\n",
       "      <td>my first press screening of 1998 and already i...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  label                                             review\n",
       "0   neg  how do films like mouse hunt get into theatres...\n",
       "1   neg  some talented actresses are blessed with a dem...\n",
       "2   pos  this has been an extraordinary year for austra...\n",
       "3   pos  according to hollywood movies made in last few...\n",
       "4   neg  my first press screening of 1998 and already i..."
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**TASK: Check to see if there are any missing values in the dataframe.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "#CODE HERE"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "label      0\n",
       "review    35\n",
       "dtype: int64"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**TASK: Remove any reviews that are NaN**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**TASK: Check to see if any reviews are blank strings and not just NaN. Note: This means a review text could just be: \"\" or \"  \" or some other larger blank string. How would you check for this? Note: There are many ways! Once you've discovered the reviews that are blank strings, go ahead and remove them as well. [Click me for a big hint](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.isspace.html)**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "27"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>label</th>\n",
       "      <th>review</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>57</th>\n",
       "      <td>neg</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>71</th>\n",
       "      <td>pos</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>147</th>\n",
       "      <td>pos</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>151</th>\n",
       "      <td>pos</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>283</th>\n",
       "      <td>pos</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>307</th>\n",
       "      <td>pos</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>313</th>\n",
       "      <td>neg</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>323</th>\n",
       "      <td>pos</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>343</th>\n",
       "      <td>pos</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>351</th>\n",
       "      <td>neg</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>427</th>\n",
       "      <td>pos</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>501</th>\n",
       "      <td>neg</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>633</th>\n",
       "      <td>pos</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>675</th>\n",
       "      <td>neg</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>815</th>\n",
       "      <td>neg</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>851</th>\n",
       "      <td>neg</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>977</th>\n",
       "      <td>neg</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1079</th>\n",
       "      <td>neg</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1299</th>\n",
       "      <td>pos</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1455</th>\n",
       "      <td>neg</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1493</th>\n",
       "      <td>pos</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1525</th>\n",
       "      <td>neg</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1531</th>\n",
       "      <td>neg</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1763</th>\n",
       "      <td>neg</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1851</th>\n",
       "      <td>neg</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1905</th>\n",
       "      <td>pos</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1993</th>\n",
       "      <td>pos</td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     label review\n",
       "57     neg       \n",
       "71     pos       \n",
       "147    pos       \n",
       "151    pos       \n",
       "283    pos       \n",
       "307    pos       \n",
       "313    neg       \n",
       "323    pos       \n",
       "343    pos       \n",
       "351    neg       \n",
       "427    pos       \n",
       "501    neg       \n",
       "633    pos       \n",
       "675    neg       \n",
       "815    neg       \n",
       "851    neg       \n",
       "977    neg       \n",
       "1079   neg       \n",
       "1299   pos       \n",
       "1455   neg       \n",
       "1493   pos       \n",
       "1525   neg       \n",
       "1531   neg       \n",
       "1763   neg       \n",
       "1851   neg       \n",
       "1905   pos       \n",
       "1993   pos       "
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "Int64Index: 1938 entries, 0 to 1999\n",
      "Data columns (total 2 columns):\n",
      " #   Column  Non-Null Count  Dtype \n",
      "---  ------  --------------  ----- \n",
      " 0   label   1938 non-null   object\n",
      " 1   review  1938 non-null   object\n",
      "dtypes: object(2)\n",
      "memory usage: 45.4+ KB\n"
     ]
    }
   ],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**TASK: Confirm the value counts per label:**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "#CODE HERE"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pos    969\n",
       "neg    969\n",
       "Name: label, dtype: int64"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## EDA on Bag of Words\n",
    "\n",
    "**Bonus Task: Can you figure out how to use a CountVectorizer model to get the top 20 words (that are not english stop words) per label type? Note, this is a bonus task as we did not show this in the lectures. But a quick cursory Google search should put you on the right path.  [Click me for a big hint](https://stackoverflow.com/questions/16288497/find-the-most-common-term-in-scikit-learn-classifier)**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [],
   "source": [
    "#CODE HERE"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Top 20 words used for Negative reviews.\n",
      "[('film', 4063), ('movie', 3131), ('like', 1808), ('just', 1480), ('time', 1127), ('good', 1117), ('bad', 997), ('character', 926), ('story', 908), ('plot', 888), ('characters', 838), ('make', 813), ('really', 743), ('way', 734), ('little', 696), ('don', 683), ('does', 666), ('doesn', 648), ('action', 635), ('scene', 634)]\n"
     ]
    }
   ],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Top 20 words used for Positive reviews.\n",
      "[('film', 5002), ('movie', 2389), ('like', 1721), ('just', 1273), ('story', 1199), ('good', 1193), ('time', 1175), ('character', 1037), ('life', 1032), ('characters', 957), ('way', 864), ('films', 851), ('does', 828), ('best', 788), ('people', 769), ('make', 764), ('little', 751), ('really', 731), ('man', 728), ('new', 702)]\n"
     ]
    }
   ],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Training and Data\n",
    "\n",
    "**TASK: Split the data into features and a label (X and y) and then preform a train/test split. You may use whatever settings you like. To compare your results to the solution notebook, use `test_size=0.20, random_state=101`**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#CODE HERE"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Training a Mode\n",
    "\n",
    "**TASK: Create a PipeLine that will both create a TF-IDF Vector out of the raw text data and fit a supervised learning model of your choice. Then fit that pipeline on the training data.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#CODE HERE"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Pipeline(steps=[('tfidf', TfidfVectorizer()), ('svc', LinearSVC())])"
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**TASK: Create a classification report and plot a confusion matrix based on the results of your PipeLine.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [],
   "source": [
    "#CODE HERE"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "              precision    recall  f1-score   support\n",
      "\n",
      "         neg       0.81      0.86      0.83       191\n",
      "         pos       0.85      0.81      0.83       197\n",
      "\n",
      "    accuracy                           0.83       388\n",
      "   macro avg       0.83      0.83      0.83       388\n",
      "weighted avg       0.83      0.83      0.83       388\n",
      "\n"
     ]
    }
   ],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at 0x1f370d0b790>"
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAUUAAAEGCAYAAADyuIefAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAc8klEQVR4nO3de5xVdb3/8dd7hpuAgjCoKCioqOEFU8TbyYNSeeuEXTxi6fGoRZppeTlZnVOWaXlOdowyNRR+al5KzdSsMDXNyxEVEFFQlAIBRQEREEQuM5/fH2sNbIZhZs2w1+zZm/fTx3rM2t+99lqfcetnvrf1XYoIzMwsUVXqAMzM2hMnRTOzAk6KZmYFnBTNzAo4KZqZFehQ6gC2RE2v6hjQv2Opw7AWeG1a11KHYC30Pu8tjog+rf38sUd3i3eX1GY6dvK01Q9FxHGtvVYxlHVSHNC/I8891L/UYVgLHLvzgaUOwVrokbjnjS35/LtLannuoV0zHVvd9/WaLblWMZR1UjSz9i+AOupKHUZmTopmlqsgWBvZms/tgQdazCx3dRn/aY6k8ZIWSnq5Qfn5kl6VNF3S/xSUf1vSLEkzJR2bJVbXFM0sV0FQW7zbiW8GrgVurS+QdDQwEhgSEasl7ZCWDwZGAfsCOwOPSNoroulqq2uKZpa7OiLT1pyIeAJY0qD4XOCqiFidHrMwLR8J/CYiVkfEbGAWMKy5azgpmlmuAqglMm1AjaRJBdvoDJfYC/iYpGcl/U3SIWn5LsC8guPmp2VNcvPZzHKXpRaYWhwRQ1t4+g5AL+Aw4BDgLkm7t/AcG53MzCw3AazNd4nC+cC9kayD+JykOqAGeBMonMjcLy1rkpvPZparyNh0rs1em2zoPuBoAEl7AZ2AxcADwChJnSUNBAYBzzV3MtcUzSxfAbVFqihKuhMYTtL3OB+4DBgPjE+n6awBzkhrjdMl3QXMANYB5zU38gxOimaWs+SOliKdK+LUzbx12maOvxK4siXXcFI0s5yJWlTqIDJzUjSzXCUDLU6KZmZA/TxFJ0Uzs/XqXFM0M0u4pmhmViAQtWU0JdpJ0cxy5+azmVkqEGuiutRhZOakaGa5SiZvu/lsZraeB1rMzFIRojZcUzQzW6/ONUUzs0Qy0FI+qaZ8IjWzsuSBFjOzBmo9T9HMLOE7WszMGqjz6LOZWSJZEMJJ0cwMSJrPa32bn5lZIgJP3jYz20CevG1mVi8or5pi+URqZmWrlqpMW3MkjZe0MH3Gc8P3LpYUkmrS15L0c0mzJE2TdFCWWJ0UzSxXgaiLbFsGNwPHNSyU1B/4JDC3oPh4YFC6jQauz3IBJ0Uzy1XyiNMOmbZmzxXxBLCkkbeuAb6ZXq7eSODWSEwEekrq29w13KdoZjlTS9ZTrJE0qeD12IgY2+TZpZHAmxHxorTRdXYB5hW8np+WLWjqfE6KZparoEV3tCyOiKFZD5bUFfgOSdO5KJwUzSx3Oa68vQcwEKivJfYDpkgaBrwJ9C84tl9a1iQnRTPLVYRyu/c5Il4Cdqh/LWkOMDQiFkt6APiapN8AhwLLIqLJpjM4KZpZzpKBluLc5ifpTmA4Sd/jfOCyiBi3mcP/BJwAzAI+AM7Mcg0nRTPLWfGe0RIRpzbz/oCC/QDOa+k1nBTNLFfJQItv8zMzW89Lh5mZpervaCkXTopmljs/uMrMLBUBa+ucFM3MgPrms5Oimdl6Od7RUnROiiXw0wv78+wj29GzZh1jH5u5vvz+cTU8cHMNVdXBoSOW86Xvbph8v3B+R748fB9Ou/htTj53USnCtlSfndfwH2Pm0rPPOgj40229uW9cH75zwxz67bEagG7b1bJyeTVf/cTeJY629Dwlx5r1yVOW8OkzF/OTr++6vmzq0935v4d6cP0jM+nUOVi6eOOv5lc/2IVDjnm/rUO1RtSuE2Mv35lZL3Vlm261XDvhNaY8sS0/OmfA+mNGf+8tVr5fPk3GfJVX87l8Iq0g+x+2km23r92o7MFbe3PK196hU+dkObieNevWv/d/f+7BTv3XsNteH7ZpnNa4JQs7MuulrgCsWlnNvFldqOm7tuCI4KhPL+Wx+7YvTYDtUF36nJbmtvYgt6QoaYCkVyTdKGm6pL9I2kbSHpImSJos6UlJ+6TH7yFpoqSXJF0haUVesbVHb/69Cy8/250LThzEJZ/dk5lTtwFg1coq7rpuB067+O0SR2iN2bHfGvbYbxWvTum6vmy/Q1fy3qIOvDW7cwkjaz+S0efqTFt7kHdNcRDwy4jYF1gKfA4YC5wfEQcDlwDXpceOAcZExP4ki0E2StJoSZMkTVr0bu3mDis7tbXw/tJqxjz4Ol/67ltc+ZUBRMCvr96Jz3x5Edt0qyt1iNZAl661fPemOdzwvZ35YMWG/6GPPmkpj9/Xs3SBtTNFfhxB7vLuU5wdEVPT/cnAAOAI4O6CFXLr/5weDpyU7t8BXN3YCdNVeMcCDB3SJRo7phzV9F3LkScsQ4J9PvoBVVWwbEk1r77Qlaf+2JNxV+zMiuXVqCro1DkYedbiUoe8VavuEHz3pjn89d7tefrPPdeXV1UHR56wjK8dN6h0wbVD7aVpnEXeSXF1wX4tsCOwNCIOzPm6ZeeI45bx4tPdOfDIFcz/e2fWrhE9etXyv/fNWn/Mr6/eiS7dap0QSy646KfzmPd6F+4d22ejdw762PvMm9WZxQs6lSi29qfcRp/beqBlOTBb0smw/hGEQ9L3JpI0rwFGtXFcberH5+7Ghf8yiPl/78IXDx7MhDt6ceyoJbw9txOjj96bH5+7G/8xZi4qn/+Otir7DlvJx09+jyFHruC6h2dy3cMzOeSY5QD880g3nRtTF1WZtvagFFNyvghcL+m/gI7Ab4AXgW8At0n6T2ACsKwEsbWJb1//RqPll147t9Hyeqdf4sGW9mD6c905duchjb730wt3bbR8axYh1rWThJdFbkkxIuYA+xW8Luwj3OS5rSTPTjgsIkLSKMCzXs0qRDk1n9vT5O2DgWuVjMAsBc4qbThmVgzl1qfYbpJiRDwJNN4mMbOy5qRoZpbyIrNmZg14nqKZWSoC1nmRWTOzDcqp+Vw+6dvMylIx732WNF7SQkkvF5T9RNKrkqZJ+r2kngXvfVvSLEkzJR2bJV4nRTPLXYQybRnczKbznB8G9ouIA4DXgG8DSBpMcnfcvulnrpPU7FI8TopmlrtiracYEU8ASxqU/SUi6hcgnQj0S/dHAr+JiNURMRuYBQxr7hruUzSzXEW0qE+xRtKkgtdj05WxsjoL+G26vwtJkqw3Py1rkpOimeVM1GYffV4cEUNbdZVk3YR1wO2t+Xw9J0Uzy13G/sJWk/TvwKeAERFRv87qm0D/gsP6pWVNcp+imeWq/t7nvFbelnQc8E3g0xHxQcFbDwCjJHWWNJDkSQDPNXc+1xTNLF+R9CsWg6Q7geEkfY/zgctIRps7Aw+nK/pPjIhzImK6pLuAGSTN6vMiotlnmDgpmlnuinWbX0Sc2kjxuCaOvxK4siXXcFI0s1xFywZaSs5J0cxyV6zmc1twUjSz3OU9+lxMTopmlqsIJ0Uzs42U0yo5Topmljv3KZqZpQJR59FnM7MNyqii6KRoZjnzQIuZWQNlVFV0UjSz3FVETVHSL2giv0fEBblEZGYVJYC6ugpIisCkJt4zM8smgEqoKUbELYWvJXVtsFaZmVkm5TRPsdnJQ5IOlzQDeDV9PUTSdblHZmaVIzJu7UCWGZU/A44F3gWIiBeBo3KMycwqSrbHm7aXwZhMo88RMS9d0bZes6vXmpmt105qgVlkSYrzJB0BhKSOwNeBV/INy8wqRkCU0ehzlubzOcB5JM9LfQs4MH1tZpaRMm6l12xNMSIWA19sg1jMrFKVUfM5y+jz7pL+IGmRpIWS7pe0e1sEZ2YVosJGn+8A7gL6AjsDdwN35hmUmVWQ+snbWbZ2IEtS7BoRv46Idel2G9Al78DMrHJEZNvag80mRUm9JPUC/izpW5IGSNpN0jeBP7VdiGZW9uqUbWuGpPFpN97LBWW9JD0s6fX05/ZpuST9XNIsSdM
      "text/plain": [
       "<Figure size 432x288 with 2 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Great job!"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
Initial commit 2 years ago			`{`
			`"cells": [`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {`
			`"collapsed": true`
			`},`
			`"source": [`
			`"___\n",`
			`"\n",`
			`"<a href='http://www.pieriandata.com'> <img src='../Pierian_Data_Logo.png' /></a>\n",`
			`"___"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"# Text Classification Assessment \n",`
			`"\n",`
			`"### Goal: Given a set of text movie reviews that have been labeled negative or positive\n",`
			`"\n",`
			`"For more information on this dataset visit http://ai.stanford.edu/~amaas/data/sentiment/\n",`
			`"\n",`
			`"## Complete the tasks in bold below!"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"Task: Perform imports and load the dataset into a pandas DataFrame\n",`
			"For this exercise you can load the dataset from `'../DATA/moviereviews.csv'`."
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 1,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"# CODE HERE"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 2,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"import numpy as np\n",`
			`"import pandas as pd"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 3,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"df = pd.read_csv('../DATA/moviereviews.csv')"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 4,`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"data": {`
			`"text/html": [`
			`"<div>\n",`
			`"<style scoped>\n",`
			`" .dataframe tbody tr th:only-of-type {\n",`
			`" vertical-align: middle;\n",`
			`" }\n",`
			`"\n",`
			`" .dataframe tbody tr th {\n",`
			`" vertical-align: top;\n",`
			`" }\n",`
			`"\n",`
			`" .dataframe thead th {\n",`
			`" text-align: right;\n",`
			`" }\n",`
			`"</style>\n",`
			`"<table border=\"1\" class=\"dataframe\">\n",`
			`" <thead>\n",`
			`" <tr style=\"text-align: right;\">\n",`
			`" <th></th>\n",`
			`" <th>label</th>\n",`
			`" <th>review</th>\n",`
			`" </tr>\n",`
			`" </thead>\n",`
			`" <tbody>\n",`
			`" <tr>\n",`
			`" <th>0</th>\n",`
			`" <td>neg</td>\n",`
			`" <td>how do films like mouse hunt get into theatres...</td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>1</th>\n",`
			`" <td>neg</td>\n",`
			`" <td>some talented actresses are blessed with a dem...</td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>2</th>\n",`
			`" <td>pos</td>\n",`
			`" <td>this has been an extraordinary year for austra...</td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>3</th>\n",`
			`" <td>pos</td>\n",`
			`" <td>according to hollywood movies made in last few...</td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>4</th>\n",`
			`" <td>neg</td>\n",`
			`" <td>my first press screening of 1998 and already i...</td>\n",`
			`" </tr>\n",`
			`" </tbody>\n",`
			`"</table>\n",`
			`"</div>"`
			`],`
			`"text/plain": [`
			`" label review\n",`
			`"0 neg how do films like mouse hunt get into theatres...\n",`
			`"1 neg some talented actresses are blessed with a dem...\n",`
			`"2 pos this has been an extraordinary year for austra...\n",`
			`"3 pos according to hollywood movies made in last few...\n",`
			`"4 neg my first press screening of 1998 and already i..."`
			`]`
			`},`
			`"execution_count": 4,`
			`"metadata": {},`
			`"output_type": "execute_result"`
			`}`
			`],`
			`"source": [`
			`"df.head()"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"TASK: Check to see if there are any missing values in the dataframe."`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 7,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"#CODE HERE"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 8,`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"data": {`
			`"text/plain": [`
			`"label 0\n",`
			`"review 35\n",`
			`"dtype: int64"`
			`]`
			`},`
			`"execution_count": 8,`
			`"metadata": {},`
			`"output_type": "execute_result"`
			`}`
			`],`
			`"source": []`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"TASK: Remove any reviews that are NaN"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 10,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": []`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"TASK: Check to see if any reviews are blank strings and not just NaN. Note: This means a review text could just be: \"\" or \" \" or some other larger blank string. How would you check for this? Note: There are many ways! Once you've discovered the reviews that are blank strings, go ahead and remove them as well. [Click me for a big hint](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.isspace.html)"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": []`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 18,`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"data": {`
			`"text/plain": [`
			`"27"`
			`]`
			`},`
			`"execution_count": 18,`
			`"metadata": {},`
			`"output_type": "execute_result"`
			`}`
			`],`
			`"source": []`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 19,`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"data": {`
			`"text/html": [`
			`"<div>\n",`
			`"<style scoped>\n",`
			`" .dataframe tbody tr th:only-of-type {\n",`
			`" vertical-align: middle;\n",`
			`" }\n",`
			`"\n",`
			`" .dataframe tbody tr th {\n",`
			`" vertical-align: top;\n",`
			`" }\n",`
			`"\n",`
			`" .dataframe thead th {\n",`
			`" text-align: right;\n",`
			`" }\n",`
			`"</style>\n",`
			`"<table border=\"1\" class=\"dataframe\">\n",`
			`" <thead>\n",`
			`" <tr style=\"text-align: right;\">\n",`
			`" <th></th>\n",`
			`" <th>label</th>\n",`
			`" <th>review</th>\n",`
			`" </tr>\n",`
			`" </thead>\n",`
			`" <tbody>\n",`
			`" <tr>\n",`
			`" <th>57</th>\n",`
			`" <td>neg</td>\n",`
			`" <td></td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>71</th>\n",`
			`" <td>pos</td>\n",`
			`" <td></td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>147</th>\n",`
			`" <td>pos</td>\n",`
			`" <td></td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>151</th>\n",`
			`" <td>pos</td>\n",`
			`" <td></td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>283</th>\n",`
			`" <td>pos</td>\n",`
			`" <td></td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>307</th>\n",`
			`" <td>pos</td>\n",`
			`" <td></td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>313</th>\n",`
			`" <td>neg</td>\n",`
			`" <td></td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>323</th>\n",`
			`" <td>pos</td>\n",`
			`" <td></td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>343</th>\n",`
			`" <td>pos</td>\n",`
			`" <td></td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>351</th>\n",`
			`" <td>neg</td>\n",`
			`" <td></td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>427</th>\n",`
			`" <td>pos</td>\n",`
			`" <td></td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>501</th>\n",`
			`" <td>neg</td>\n",`
			`" <td></td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>633</th>\n",`
			`" <td>pos</td>\n",`
			`" <td></td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>675</th>\n",`
			`" <td>neg</td>\n",`
			`" <td></td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>815</th>\n",`
			`" <td>neg</td>\n",`
			`" <td></td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>851</th>\n",`
			`" <td>neg</td>\n",`
			`" <td></td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>977</th>\n",`
			`" <td>neg</td>\n",`
			`" <td></td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>1079</th>\n",`
			`" <td>neg</td>\n",`
			`" <td></td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>1299</th>\n",`
			`" <td>pos</td>\n",`
			`" <td></td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>1455</th>\n",`
			`" <td>neg</td>\n",`
			`" <td></td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>1493</th>\n",`
			`" <td>pos</td>\n",`
			`" <td></td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>1525</th>\n",`
			`" <td>neg</td>\n",`
			`" <td></td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>1531</th>\n",`
			`" <td>neg</td>\n",`
			`" <td></td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>1763</th>\n",`
			`" <td>neg</td>\n",`
			`" <td></td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>1851</th>\n",`
			`" <td>neg</td>\n",`
			`" <td></td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>1905</th>\n",`
			`" <td>pos</td>\n",`
			`" <td></td>\n",`
			`" </tr>\n",`
			`" <tr>\n",`
			`" <th>1993</th>\n",`
			`" <td>pos</td>\n",`
			`" <td></td>\n",`
			`" </tr>\n",`
			`" </tbody>\n",`
			`"</table>\n",`
			`"</div>"`
			`],`
			`"text/plain": [`
			`" label review\n",`
			`"57 neg \n",`
			`"71 pos \n",`
			`"147 pos \n",`
			`"151 pos \n",`
			`"283 pos \n",`
			`"307 pos \n",`
			`"313 neg \n",`
			`"323 pos \n",`
			`"343 pos \n",`
			`"351 neg \n",`
			`"427 pos \n",`
			`"501 neg \n",`
			`"633 pos \n",`
			`"675 neg \n",`
			`"815 neg \n",`
			`"851 neg \n",`
			`"977 neg \n",`
			`"1079 neg \n",`
			`"1299 pos \n",`
			`"1455 neg \n",`
			`"1493 pos \n",`
			`"1525 neg \n",`
			`"1531 neg \n",`
			`"1763 neg \n",`
			`"1851 neg \n",`
			`"1905 pos \n",`
			`"1993 pos "`
			`]`
			`},`
			`"execution_count": 19,`
			`"metadata": {},`
			`"output_type": "execute_result"`
			`}`
			`],`
			`"source": []`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 21,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": []`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 22,`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"name": "stdout",`
			`"output_type": "stream",`
			`"text": [`
			`"<class 'pandas.core.frame.DataFrame'>\n",`
			`"Int64Index: 1938 entries, 0 to 1999\n",`
			`"Data columns (total 2 columns):\n",`
			`" # Column Non-Null Count Dtype \n",`
			`"--- ------ -------------- ----- \n",`
			`" 0 label 1938 non-null object\n",`
			`" 1 review 1938 non-null object\n",`
			`"dtypes: object(2)\n",`
			`"memory usage: 45.4+ KB\n"`
			`]`
			`}`
			`],`
			`"source": []`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"TASK: Confirm the value counts per label:"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 23,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"#CODE HERE"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 24,`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"data": {`
			`"text/plain": [`
			`"pos 969\n",`
			`"neg 969\n",`
			`"Name: label, dtype: int64"`
			`]`
			`},`
			`"execution_count": 24,`
			`"metadata": {},`
			`"output_type": "execute_result"`
			`}`
			`],`
			`"source": []`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"## EDA on Bag of Words\n",`
			`"\n",`
			`"Bonus Task: Can you figure out how to use a CountVectorizer model to get the top 20 words (that are not english stop words) per label type? Note, this is a bonus task as we did not show this in the lectures. But a quick cursory Google search should put you on the right path. [Click me for a big hint](https://stackoverflow.com/questions/16288497/find-the-most-common-term-in-scikit-learn-classifier)"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 45,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"#CODE HERE"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 46,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": []`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 55,`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"name": "stdout",`
			`"output_type": "stream",`
			`"text": [`
			`"Top 20 words used for Negative reviews.\n",`
			`"[('film', 4063), ('movie', 3131), ('like', 1808), ('just', 1480), ('time', 1127), ('good', 1117), ('bad', 997), ('character', 926), ('story', 908), ('plot', 888), ('characters', 838), ('make', 813), ('really', 743), ('way', 734), ('little', 696), ('don', 683), ('does', 666), ('doesn', 648), ('action', 635), ('scene', 634)]\n"`
			`]`
			`}`
			`],`
			`"source": []`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 56,`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"name": "stdout",`
			`"output_type": "stream",`
			`"text": [`
			`"Top 20 words used for Positive reviews.\n",`
			`"[('film', 5002), ('movie', 2389), ('like', 1721), ('just', 1273), ('story', 1199), ('good', 1193), ('time', 1175), ('character', 1037), ('life', 1032), ('characters', 957), ('way', 864), ('films', 851), ('does', 828), ('best', 788), ('people', 769), ('make', 764), ('little', 751), ('really', 731), ('man', 728), ('new', 702)]\n"`
			`]`
			`}`
			`],`
			`"source": []`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"### Training and Data\n",`
			`"\n",`
			"TASK: Split the data into features and a label (X and y) and then preform a train/test split. You may use whatever settings you like. To compare your results to the solution notebook, use `test_size=0.20, random_state=101`"
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"#CODE HERE"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 57,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": []`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"### Training a Mode\n",`
			`"\n",`
			`"TASK: Create a PipeLine that will both create a TF-IDF Vector out of the raw text data and fit a supervised learning model of your choice. Then fit that pipeline on the training data."`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"#CODE HERE"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 74,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": []`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 76,`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"data": {`
			`"text/plain": [`
			`"Pipeline(steps=[('tfidf', TfidfVectorizer()), ('svc', LinearSVC())])"`
			`]`
			`},`
			`"execution_count": 76,`
			`"metadata": {},`
			`"output_type": "execute_result"`
			`}`
			`],`
			`"source": []`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"TASK: Create a classification report and plot a confusion matrix based on the results of your PipeLine."`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 77,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"#CODE HERE"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 78,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": []`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 79,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": []`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 80,`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"name": "stdout",`
			`"output_type": "stream",`
			`"text": [`
			`" precision recall f1-score support\n",`
			`"\n",`
			`" neg 0.81 0.86 0.83 191\n",`
			`" pos 0.85 0.81 0.83 197\n",`
			`"\n",`
			`" accuracy 0.83 388\n",`
			`" macro avg 0.83 0.83 0.83 388\n",`
			`"weighted avg 0.83 0.83 0.83 388\n",`
			`"\n"`
			`]`
			`}`
			`],`
			`"source": []`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 81,`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"data": {`
			`"text/plain": [`
			`"<sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at 0x1f370d0b790>"`
			`]`
			`},`
			`"execution_count": 81,`
			`"metadata": {},`
			`"output_type": "execute_result"`
			`},`
			`{`
			`"data": {`
			"image/png": "iVBORw0KGgoAAAANSUhEUgAAAUUAAAEGCAYAAADyuIefAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAc8klEQVR4nO3de5xVdb3/8dd7hpuAgjCoKCioqOEFU8TbyYNSeeuEXTxi6fGoRZppeTlZnVOWaXlOdowyNRR+al5KzdSsMDXNyxEVEFFQlAIBRQEREEQuM5/fH2sNbIZhZs2w1+zZm/fTx3rM2t+99lqfcetnvrf1XYoIzMwsUVXqAMzM2hMnRTOzAk6KZmYFnBTNzAo4KZqZFehQ6gC2RE2v6hjQv2Opw7AWeG1a11KHYC30Pu8tjog+rf38sUd3i3eX1GY6dvK01Q9FxHGtvVYxlHVSHNC/I8891L/UYVgLHLvzgaUOwVrokbjnjS35/LtLannuoV0zHVvd9/WaLblWMZR1UjSz9i+AOupKHUZmTopmlqsgWBvZms/tgQdazCx3dRn/aY6k8ZIWSnq5Qfn5kl6VNF3S/xSUf1vSLEkzJR2bJVbXFM0sV0FQW7zbiW8GrgVurS+QdDQwEhgSEasl7ZCWDwZGAfsCOwOPSNoroulqq2uKZpa7OiLT1pyIeAJY0qD4XOCqiFidHrMwLR8J/CYiVkfEbGAWMKy5azgpmlmuAqglMm1AjaRJBdvoDJfYC/iYpGcl/U3SIWn5LsC8guPmp2VNcvPZzHKXpRaYWhwRQ1t4+g5AL+Aw4BDgLkm7t/AcG53MzCw3AazNd4nC+cC9kayD+JykOqAGeBMonMjcLy1rkpvPZparyNh0rs1em2zoPuBoAEl7AZ2AxcADwChJnSUNBAYBzzV3MtcUzSxfAbVFqihKuhMYTtL3OB+4DBgPjE+n6awBzkhrjdMl3QXMANYB5zU38gxOimaWs+SOliKdK+LUzbx12maOvxK4siXXcFI0s5yJWlTqIDJzUjSzXCUDLU6KZmZA/TxFJ0Uzs/XqXFM0M0u4pmhmViAQtWU0JdpJ0cxy5+azmVkqEGuiutRhZOakaGa5SiZvu/lsZraeB1rMzFIRojZcUzQzW6/ONUUzs0Qy0FI+qaZ8IjWzsuSBFjOzBmo9T9HMLOE7WszMGqjz6LOZWSJZEMJJ0cwMSJrPa32bn5lZIgJP3jYz20CevG1mVi8or5pi+URqZmWrlqpMW3MkjZe0MH3Gc8P3LpYUkmrS15L0c0mzJE2TdFCWWJ0UzSxXgaiLbFsGNwPHNSyU1B/4JDC3oPh4YFC6jQauz3IBJ0Uzy1XyiNMOmbZmzxXxBLCkkbeuAb6ZXq7eSODWSEwEekrq29w13KdoZjlTS9ZTrJE0qeD12IgY2+TZpZHAmxHxorTRdXYB5hW8np+WLWjqfE6KZparoEV3tCyOiKFZD5bUFfgOSdO5KJwUzSx3Oa68vQcwEKivJfYDpkgaBrwJ9C84tl9a1iQnRTPLVYRyu/c5Il4Cdqh/LWkOMDQiFkt6APiapN8AhwLLIqLJpjM4KZpZzpKBluLc5ifpTmA4Sd/jfOCyiBi3mcP/BJwAzAI+AM7Mcg0nRTPLWfGe0RIRpzbz/oCC/QDOa+k1nBTNLFfJQItv8zMzW89Lh5mZpervaCkXTopmljs/uMrMLBUBa+ucFM3MgPrms5Oimdl6Od7RUnROiiXw0wv78+wj29GzZh1jH5u5vvz+cTU8cHMNVdXBoSOW86Xvbph8v3B+R748fB9Ou/htTj53USnCtlSfndfwH2Pm0rPPOgj40229uW9cH75zwxz67bEagG7b1bJyeTVf/cTeJY629Dwlx5r1yVOW8OkzF/OTr++6vmzq0935v4d6cP0jM+nUOVi6eOOv5lc/2IVDjnm/rUO1RtSuE2Mv35lZL3Vlm261XDvhNaY8sS0/OmfA+mNGf+8tVr5fPk3GfJVX87l8Iq0g+x+2km23r92o7MFbe3PK196hU+dkObieNevWv/d/f+7BTv3XsNteH7ZpnNa4JQs7MuulrgCsWlnNvFldqOm7tuCI4KhPL+Wx+7YvTYDtUF36nJbmtvYgt6QoaYCkVyTdKGm6pL9I2kbSHpImSJos6UlJ+6TH7yFpoqSXJF0haUVesbVHb/69Cy8/250LThzEJZ/dk5lTtwFg1coq7rpuB067+O0SR2iN2bHfGvbYbxWvTum6vmy/Q1fy3qIOvDW7cwkjaz+S0efqTFt7kHdNcRDwy4jYF1gKfA4YC5wfEQcDlwDXpceOAcZExP4ki0E2StJoSZMkTVr0bu3mDis7tbXw/tJqxjz4Ol/67ltc+ZUBRMCvr96Jz3x5Edt0qyt1iNZAl661fPemOdzwvZ35YMWG/6GPPmkpj9/Xs3SBtTNFfhxB7vLuU5wdEVPT/cnAAOAI4O6CFXLr/5weDpyU7t8BXN3YCdNVeMcCDB3SJRo7phzV9F3LkScsQ4J9PvoBVVWwbEk1r77Qlaf+2JNxV+zMiuXVqCro1DkYedbiUoe8VavuEHz3pjn89d7tefrPPdeXV1UHR56wjK8dN6h0wbVD7aVpnEXeSXF1wX4tsCOwNCIOzPm6ZeeI45bx4tPdOfDIFcz/e2fWrhE9etXyv/fNWn/Mr6/eiS7dap0QSy646KfzmPd6F+4d22ejdw762PvMm9WZxQs6lSi29qfcRp/beqBlOTBb0smw/hGEQ9L3JpI0rwFGtXFcberH5+7Ghf8yiPl/78IXDx7MhDt6ceyoJbw9txOjj96bH5+7G/8xZi4qn/+Otir7DlvJx09+jyFHruC6h2dy3cMzOeSY5QD880g3nRtTF1WZtvagFFNyvghcL+m/gI7Ab4AXgW8At0n6T2ACsKwEsbWJb1//RqPll147t9Hyeqdf4sGW9mD6c905duchjb730wt3bbR8axYh1rWThJdFbkkxIuYA+xW8Luwj3OS5rSTPTjgsIkLSKMCzXs0qRDk1n9vT5O2DgWuVjMAsBc4qbThmVgzl1qfYbpJiRDwJNN4mMbOy5qRoZpbyIrNmZg14nqKZWSoC1nmRWTOzDcqp+Vw+6dvMylIx732WNF7SQkkvF5T9RNKrkqZJ+r2kngXvfVvSLEkzJR2bJV4nRTPLXYQybRnczKbznB8G9ouIA4DXgG8DSBpMcnfcvulnrpPU7FI8TopmlrtiracYEU8ASxqU/SUi6hcgnQj0S/dHAr+JiNURMRuYBQxr7hruUzSzXEW0qE+xRtKkgtdj05WxsjoL+G26vwtJkqw3Py1rkpOimeVM1GYffV4cEUNbdZVk3YR1wO2t+Xw9J0Uzy13G/sJWk/TvwKeAERFRv87qm0D/gsP6pWVNcp+imeWq/t7nvFbelnQc8E3g0xHxQcFbDwCjJHWWNJDkSQDPNXc+1xTNLF+R9CsWg6Q7geEkfY/zgctIRps7Aw+nK/pPjIhzImK6pLuAGSTN6vMiotlnmDgpmlnuinWbX0Sc2kjxuCaOvxK4siXXcFI0s1xFywZaSs5J0cxyV6zmc1twUjSz3OU9+lxMTopmlqsIJ0Uzs42U0yo5Topmljv3KZqZpQJR59FnM7MNyqii6KRoZjnzQIuZWQNlVFV0UjSz3FVETVHSL2giv0fEBblEZGYVJYC6ugpIisCkJt4zM8smgEqoKUbELYWvJXVtsFaZmVkm5TRPsdnJQ5IOlzQDeDV9PUTSdblHZmaVIzJu7UCWGZU/A44F3gWIiBeBo3KMycwqSrbHm7aXwZhMo88RMS9d0bZes6vXmpmt105qgVlkSYrzJB0BhKSOwNeBV/INy8wqRkCU0ehzlubzOcB5JM9LfQs4MH1tZpaRMm6l12xNMSIWA19sg1jMrFKVUfM5y+jz7pL+IGmRpIWS7pe0e1sEZ2YVosJGn+8A7gL6AjsDdwN35hmUmVWQ+snbWbZ2IEtS7BoRv46Idel2G9Al78DMrHJEZNvag80mRUm9JPUC/izpW5IGSNpN0jeBP7VdiGZW9uqUbWuGpPFpN97LBWW9JD0s6fX05/ZpuST9XNIsSdM
			`"text/plain": [`
			`"<Figure size 432x288 with 2 Axes>"`
			`]`
			`},`
			`"metadata": {`
			`"needs_background": "light"`
			`},`
			`"output_type": "display_data"`
			`}`
			`],`
			`"source": []`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"## Great job!"`
			`]`
			`}`
			`],`
			`"metadata": {`
			`"kernelspec": {`
			`"display_name": "Python 3",`
			`"language": "python",`
			`"name": "python3"`
			`},`
			`"language_info": {`
			`"codemirror_mode": {`
			`"name": "ipython",`
			`"version": 3`
			`},`
			`"file_extension": ".py",`
			`"mimetype": "text/x-python",`
			`"name": "python",`
			`"nbconvert_exporter": "python",`
			`"pygments_lexer": "ipython3",`
			`"version": "3.8.5"`
			`}`
			`},`
			`"nbformat": 4,`
			`"nbformat_minor": 2`
			`}`