You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
736 lines
27 KiB
736 lines
27 KiB
2 years ago
|
{
|
||
|
"cells": [
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"collapsed": true
|
||
|
},
|
||
|
"source": [
|
||
|
"___\n",
|
||
|
"\n",
|
||
|
"<a href='http://www.pieriandata.com'> <img src='../Pierian_Data_Logo.png' /></a>\n",
|
||
|
"___"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"# Text Classification Assessment \n",
|
||
|
"\n",
|
||
|
"### Goal: Given a set of text movie reviews that have been labeled negative or positive\n",
|
||
|
"\n",
|
||
|
"For more information on this dataset visit http://ai.stanford.edu/~amaas/data/sentiment/\n",
|
||
|
"\n",
|
||
|
"## Complete the tasks in bold below!"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"**Task: Perform imports and load the dataset into a pandas DataFrame**\n",
|
||
|
"For this exercise you can load the dataset from `'../DATA/moviereviews.csv'`."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 1,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"# CODE HERE"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 2,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"import numpy as np\n",
|
||
|
"import pandas as pd"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 3,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"df = pd.read_csv('../DATA/moviereviews.csv')"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 4,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/html": [
|
||
|
"<div>\n",
|
||
|
"<style scoped>\n",
|
||
|
" .dataframe tbody tr th:only-of-type {\n",
|
||
|
" vertical-align: middle;\n",
|
||
|
" }\n",
|
||
|
"\n",
|
||
|
" .dataframe tbody tr th {\n",
|
||
|
" vertical-align: top;\n",
|
||
|
" }\n",
|
||
|
"\n",
|
||
|
" .dataframe thead th {\n",
|
||
|
" text-align: right;\n",
|
||
|
" }\n",
|
||
|
"</style>\n",
|
||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
||
|
" <thead>\n",
|
||
|
" <tr style=\"text-align: right;\">\n",
|
||
|
" <th></th>\n",
|
||
|
" <th>label</th>\n",
|
||
|
" <th>review</th>\n",
|
||
|
" </tr>\n",
|
||
|
" </thead>\n",
|
||
|
" <tbody>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>0</th>\n",
|
||
|
" <td>neg</td>\n",
|
||
|
" <td>how do films like mouse hunt get into theatres...</td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>1</th>\n",
|
||
|
" <td>neg</td>\n",
|
||
|
" <td>some talented actresses are blessed with a dem...</td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>2</th>\n",
|
||
|
" <td>pos</td>\n",
|
||
|
" <td>this has been an extraordinary year for austra...</td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>3</th>\n",
|
||
|
" <td>pos</td>\n",
|
||
|
" <td>according to hollywood movies made in last few...</td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>4</th>\n",
|
||
|
" <td>neg</td>\n",
|
||
|
" <td>my first press screening of 1998 and already i...</td>\n",
|
||
|
" </tr>\n",
|
||
|
" </tbody>\n",
|
||
|
"</table>\n",
|
||
|
"</div>"
|
||
|
],
|
||
|
"text/plain": [
|
||
|
" label review\n",
|
||
|
"0 neg how do films like mouse hunt get into theatres...\n",
|
||
|
"1 neg some talented actresses are blessed with a dem...\n",
|
||
|
"2 pos this has been an extraordinary year for austra...\n",
|
||
|
"3 pos according to hollywood movies made in last few...\n",
|
||
|
"4 neg my first press screening of 1998 and already i..."
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 4,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df.head()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"**TASK: Check to see if there are any missing values in the dataframe.**"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 7,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"#CODE HERE"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 8,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"label 0\n",
|
||
|
"review 35\n",
|
||
|
"dtype: int64"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 8,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": []
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"**TASK: Remove any reviews that are NaN**"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 10,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": []
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"**TASK: Check to see if any reviews are blank strings and not just NaN. Note: This means a review text could just be: \"\" or \" \" or some other larger blank string. How would you check for this? Note: There are many ways! Once you've discovered the reviews that are blank strings, go ahead and remove them as well. [Click me for a big hint](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.isspace.html)**"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": []
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 18,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"27"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 18,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": []
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 19,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/html": [
|
||
|
"<div>\n",
|
||
|
"<style scoped>\n",
|
||
|
" .dataframe tbody tr th:only-of-type {\n",
|
||
|
" vertical-align: middle;\n",
|
||
|
" }\n",
|
||
|
"\n",
|
||
|
" .dataframe tbody tr th {\n",
|
||
|
" vertical-align: top;\n",
|
||
|
" }\n",
|
||
|
"\n",
|
||
|
" .dataframe thead th {\n",
|
||
|
" text-align: right;\n",
|
||
|
" }\n",
|
||
|
"</style>\n",
|
||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
||
|
" <thead>\n",
|
||
|
" <tr style=\"text-align: right;\">\n",
|
||
|
" <th></th>\n",
|
||
|
" <th>label</th>\n",
|
||
|
" <th>review</th>\n",
|
||
|
" </tr>\n",
|
||
|
" </thead>\n",
|
||
|
" <tbody>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>57</th>\n",
|
||
|
" <td>neg</td>\n",
|
||
|
" <td></td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>71</th>\n",
|
||
|
" <td>pos</td>\n",
|
||
|
" <td></td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>147</th>\n",
|
||
|
" <td>pos</td>\n",
|
||
|
" <td></td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>151</th>\n",
|
||
|
" <td>pos</td>\n",
|
||
|
" <td></td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>283</th>\n",
|
||
|
" <td>pos</td>\n",
|
||
|
" <td></td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>307</th>\n",
|
||
|
" <td>pos</td>\n",
|
||
|
" <td></td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>313</th>\n",
|
||
|
" <td>neg</td>\n",
|
||
|
" <td></td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>323</th>\n",
|
||
|
" <td>pos</td>\n",
|
||
|
" <td></td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>343</th>\n",
|
||
|
" <td>pos</td>\n",
|
||
|
" <td></td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>351</th>\n",
|
||
|
" <td>neg</td>\n",
|
||
|
" <td></td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>427</th>\n",
|
||
|
" <td>pos</td>\n",
|
||
|
" <td></td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>501</th>\n",
|
||
|
" <td>neg</td>\n",
|
||
|
" <td></td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>633</th>\n",
|
||
|
" <td>pos</td>\n",
|
||
|
" <td></td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>675</th>\n",
|
||
|
" <td>neg</td>\n",
|
||
|
" <td></td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>815</th>\n",
|
||
|
" <td>neg</td>\n",
|
||
|
" <td></td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>851</th>\n",
|
||
|
" <td>neg</td>\n",
|
||
|
" <td></td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>977</th>\n",
|
||
|
" <td>neg</td>\n",
|
||
|
" <td></td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>1079</th>\n",
|
||
|
" <td>neg</td>\n",
|
||
|
" <td></td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>1299</th>\n",
|
||
|
" <td>pos</td>\n",
|
||
|
" <td></td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>1455</th>\n",
|
||
|
" <td>neg</td>\n",
|
||
|
" <td></td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>1493</th>\n",
|
||
|
" <td>pos</td>\n",
|
||
|
" <td></td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>1525</th>\n",
|
||
|
" <td>neg</td>\n",
|
||
|
" <td></td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>1531</th>\n",
|
||
|
" <td>neg</td>\n",
|
||
|
" <td></td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>1763</th>\n",
|
||
|
" <td>neg</td>\n",
|
||
|
" <td></td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>1851</th>\n",
|
||
|
" <td>neg</td>\n",
|
||
|
" <td></td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>1905</th>\n",
|
||
|
" <td>pos</td>\n",
|
||
|
" <td></td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>1993</th>\n",
|
||
|
" <td>pos</td>\n",
|
||
|
" <td></td>\n",
|
||
|
" </tr>\n",
|
||
|
" </tbody>\n",
|
||
|
"</table>\n",
|
||
|
"</div>"
|
||
|
],
|
||
|
"text/plain": [
|
||
|
" label review\n",
|
||
|
"57 neg \n",
|
||
|
"71 pos \n",
|
||
|
"147 pos \n",
|
||
|
"151 pos \n",
|
||
|
"283 pos \n",
|
||
|
"307 pos \n",
|
||
|
"313 neg \n",
|
||
|
"323 pos \n",
|
||
|
"343 pos \n",
|
||
|
"351 neg \n",
|
||
|
"427 pos \n",
|
||
|
"501 neg \n",
|
||
|
"633 pos \n",
|
||
|
"675 neg \n",
|
||
|
"815 neg \n",
|
||
|
"851 neg \n",
|
||
|
"977 neg \n",
|
||
|
"1079 neg \n",
|
||
|
"1299 pos \n",
|
||
|
"1455 neg \n",
|
||
|
"1493 pos \n",
|
||
|
"1525 neg \n",
|
||
|
"1531 neg \n",
|
||
|
"1763 neg \n",
|
||
|
"1851 neg \n",
|
||
|
"1905 pos \n",
|
||
|
"1993 pos "
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 19,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": []
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 21,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": []
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 22,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"<class 'pandas.core.frame.DataFrame'>\n",
|
||
|
"Int64Index: 1938 entries, 0 to 1999\n",
|
||
|
"Data columns (total 2 columns):\n",
|
||
|
" # Column Non-Null Count Dtype \n",
|
||
|
"--- ------ -------------- ----- \n",
|
||
|
" 0 label 1938 non-null object\n",
|
||
|
" 1 review 1938 non-null object\n",
|
||
|
"dtypes: object(2)\n",
|
||
|
"memory usage: 45.4+ KB\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": []
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"**TASK: Confirm the value counts per label:**"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 23,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"#CODE HERE"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 24,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"pos 969\n",
|
||
|
"neg 969\n",
|
||
|
"Name: label, dtype: int64"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 24,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": []
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## EDA on Bag of Words\n",
|
||
|
"\n",
|
||
|
"**Bonus Task: Can you figure out how to use a CountVectorizer model to get the top 20 words (that are not english stop words) per label type? Note, this is a bonus task as we did not show this in the lectures. But a quick cursory Google search should put you on the right path. [Click me for a big hint](https://stackoverflow.com/questions/16288497/find-the-most-common-term-in-scikit-learn-classifier)**"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 45,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"#CODE HERE"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 46,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": []
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 55,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"Top 20 words used for Negative reviews.\n",
|
||
|
"[('film', 4063), ('movie', 3131), ('like', 1808), ('just', 1480), ('time', 1127), ('good', 1117), ('bad', 997), ('character', 926), ('story', 908), ('plot', 888), ('characters', 838), ('make', 813), ('really', 743), ('way', 734), ('little', 696), ('don', 683), ('does', 666), ('doesn', 648), ('action', 635), ('scene', 634)]\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": []
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 56,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"Top 20 words used for Positive reviews.\n",
|
||
|
"[('film', 5002), ('movie', 2389), ('like', 1721), ('just', 1273), ('story', 1199), ('good', 1193), ('time', 1175), ('character', 1037), ('life', 1032), ('characters', 957), ('way', 864), ('films', 851), ('does', 828), ('best', 788), ('people', 769), ('make', 764), ('little', 751), ('really', 731), ('man', 728), ('new', 702)]\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": []
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Training and Data\n",
|
||
|
"\n",
|
||
|
"**TASK: Split the data into features and a label (X and y) and then preform a train/test split. You may use whatever settings you like. To compare your results to the solution notebook, use `test_size=0.20, random_state=101`**"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"#CODE HERE"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 57,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": []
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Training a Mode\n",
|
||
|
"\n",
|
||
|
"**TASK: Create a PipeLine that will both create a TF-IDF Vector out of the raw text data and fit a supervised learning model of your choice. Then fit that pipeline on the training data.**"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"#CODE HERE"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 74,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": []
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 76,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"Pipeline(steps=[('tfidf', TfidfVectorizer()), ('svc', LinearSVC())])"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 76,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": []
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"**TASK: Create a classification report and plot a confusion matrix based on the results of your PipeLine.**"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 77,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"#CODE HERE"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 78,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": []
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 79,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": []
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 80,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
" precision recall f1-score support\n",
|
||
|
"\n",
|
||
|
" neg 0.81 0.86 0.83 191\n",
|
||
|
" pos 0.85 0.81 0.83 197\n",
|
||
|
"\n",
|
||
|
" accuracy 0.83 388\n",
|
||
|
" macro avg 0.83 0.83 0.83 388\n",
|
||
|
"weighted avg 0.83 0.83 0.83 388\n",
|
||
|
"\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": []
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 81,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"<sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at 0x1f370d0b790>"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 81,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
},
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAUUAAAEGCAYAAADyuIefAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAc8klEQVR4nO3de5xVdb3/8dd7hpuAgjCoKCioqOEFU8TbyYNSeeuEXTxi6fGoRZppeTlZnVOWaXlOdowyNRR+al5KzdSsMDXNyxEVEFFQlAIBRQEREEQuM5/fH2sNbIZhZs2w1+zZm/fTx3rM2t+99lqfcetnvrf1XYoIzMwsUVXqAMzM2hMnRTOzAk6KZmYFnBTNzAo4KZqZFehQ6gC2RE2v6hjQv2Opw7AWeG1a11KHYC30Pu8tjog+rf38sUd3i3eX1GY6dvK01Q9FxHGtvVYxlHVSHNC/I8891L/UYVgLHLvzgaUOwVrokbjnjS35/LtLannuoV0zHVvd9/WaLblWMZR1UjSz9i+AOupKHUZmTopmlqsgWBvZms/tgQdazCx3dRn/aY6k8ZIWSnq5Qfn5kl6VNF3S/xSUf1vSLEkzJR2bJVbXFM0sV0FQW7zbiW8GrgVurS+QdDQwEhgSEasl7ZCWDwZGAfsCOwOPSNoroulqq2uKZpa7OiLT1pyIeAJY0qD4XOCqiFidHrMwLR8J/CYiVkfEbGAWMKy5azgpmlmuAqglMm1AjaRJBdvoDJfYC/iYpGcl/U3SIWn5LsC8guPmp2VNcvPZzHKXpRaYWhwRQ1t4+g5AL+Aw4BDgLkm7t/AcG53MzCw3AazNd4nC+cC9kayD+JykOqAGeBMonMjcLy1rkpvPZparyNh0rs1em2zoPuBoAEl7AZ2AxcADwChJnSUNBAYBzzV3MtcUzSxfAbVFqihKuhMYTtL3OB+4DBgPjE+n6awBzkhrjdMl3QXMANYB5zU38gxOimaWs+SOliKdK+LUzbx12maOvxK4siXXcFI0s5yJWlTqIDJzUjSzXCUDLU6KZmZA/TxFJ0Uzs/XqXFM0M0u4pmhmViAQtWU0JdpJ0cxy5+azmVkqEGuiutRhZOakaGa5SiZvu/lsZraeB1rMzFIRojZcUzQzW6/ONUUzs0Qy0FI+qaZ8IjWzsuSBFjOzBmo9T9HMLOE7WszMGqjz6LOZWSJZEMJJ0cwMSJrPa32bn5lZIgJP3jYz20CevG1mVi8or5pi+URqZmWrlqpMW3MkjZe0MH3Gc8P3LpYUkmrS15L0c0mzJE2TdFCWWJ0UzSxXgaiLbFsGNwPHNSyU1B/4JDC3oPh4YFC6jQauz3IBJ0Uzy1XyiNMOmbZmzxXxBLCkkbeuAb6ZXq7eSODWSEwEekrq29w13KdoZjlTS9ZTrJE0qeD12IgY2+TZpZHAmxHxorTRdXYB5hW8np+WLWjqfE6KZparoEV3tCyOiKFZD5bUFfgOSdO5KJwUzSx3Oa68vQcwEKivJfYDpkgaBrwJ9C84tl9a1iQnRTPLVYRyu/c5Il4Cdqh/LWkOMDQiFkt6APiapN8AhwLLIqLJpjM4KZpZzpKBluLc5ifpTmA4Sd/jfOCyiBi3mcP/BJwAzAI+AM7Mcg0nRTPLWfGe0RIRpzbz/oCC/QDOa+k1nBTNLFfJQItv8zMzW89Lh5mZpervaCkXTopmljs/uMrMLBUBa+ucFM3MgPrms5Oimdl6Od7RUnROiiXw0wv78+wj29GzZh1jH5u5vvz+cTU8cHMNVdXBoSOW86Xvbph8v3B+R748fB9Ou/htTj53USnCtlSfndfwH2Pm0rPPOgj40229uW9cH75zwxz67bEagG7b1bJyeTVf/cTeJY629Dwlx5r1yVOW8OkzF/OTr++6vmzq0935v4d6cP0jM+nUOVi6eOOv5lc/2IVDjnm/rUO1RtSuE2Mv35lZL3Vlm261XDvhNaY8sS0/OmfA+mNGf+8tVr5fPk3GfJVX87l8Iq0g+x+2km23r92o7MFbe3PK196hU+dkObieNevWv/d/f+7BTv3XsNteH7ZpnNa4JQs7MuulrgCsWlnNvFldqOm7tuCI4KhPL+Wx+7YvTYDtUF36nJbmtvYgt6QoaYCkVyTdKGm6pL9I2kbSHpImSJos6UlJ+6TH7yFpoqSXJF0haUVesbVHb/69Cy8/250LThzEJZ/dk5lTtwFg1coq7rpuB067+O0SR2iN2bHfGvbYbxWvTum6vmy/Q1fy3qIOvDW7cwkjaz+S0efqTFt7kHdNcRDwy4jYF1gKfA4YC5wfEQcDlwDXpceOAcZExP4ki0E2StJoSZMkTVr0bu3mDis7tbXw/tJqxjz4Ol/67ltc+ZUBRMCvr96Jz3x5Edt0qyt1iNZAl661fPemOdzwvZ35YMWG/6GPPmkpj9/Xs3SBtTNFfhxB7vLuU5wdEVPT/cnAAOAI4O6CFXLr/5weDpyU7t8BXN3YCdNVeMcCDB3SJRo7phzV9F3LkScsQ4J9PvoBVVWwbEk1r77Qlaf+2JNxV+zMiuXVqCro1DkYedbiUoe8VavuEHz3pjn89d7tefrPPdeXV1UHR56wjK8dN6h0wbVD7aVpnEXeSXF1wX4tsCOwNCIOzPm6ZeeI45bx4tPdOfDIFcz/e2fWrhE9etXyv/fNWn/Mr6/eiS7dap0QSy646KfzmPd6F+4d22ejdw762PvMm9WZxQs6lSi29qfcRp/beqBlOTBb0smw/hGEQ9L3JpI0rwFGtXFcberH5+7Ghf8yiPl/78IXDx7MhDt6ceyoJbw9txOjj96bH5+7G/8xZi4qn/+Otir7DlvJx09+jyFHruC6h2dy3cMzOeSY5QD880g3nRtTF1WZtvagFFNyvghcL+m/gI7Ab4AXgW8At0n6T2ACsKwEsbWJb1//RqPll147t9Hyeqdf4sGW9mD6c905duchjb730wt3bbR8axYh1rWThJdFbkkxIuYA+xW8Luwj3OS5rSTPTjgsIkLSKMCzXs0qRDk1n9vT5O2DgWuVjMAsBc4qbThmVgzl1qfYbpJiRDwJNN4mMbOy5qRoZpbyIrNmZg14nqKZWSoC1nmRWTOzDcqp+Vw+6dvMylIx732WNF7SQkkvF5T9RNKrkqZJ+r2kngXvfVvSLEkzJR2bJV4nRTPLXYQybRnczKbznB8G9ouIA4DXgG8DSBpMcnfcvulnrpPU7FI8TopmlrtiracYEU8ASxqU/SUi6hcgnQj0S/dHAr+JiNURMRuYBQxr7hruUzSzXEW0qE+xRtKkgtdj05WxsjoL+G26vwtJkqw3Py1rkpOimeVM1GYffV4cEUNbdZVk3YR1wO2t+Xw9J0Uzy13G/sJWk/TvwKeAERFRv87qm0D/gsP6pWVNcp+imeWq/t7nvFbelnQc8E3g0xHxQcFbDwCjJHWWNJDkSQDPNXc+1xTNLF+R9CsWg6Q7geEkfY/zgctIRps7Aw+nK/pPjIhzImK6pLuAGSTN6vMiotlnmDgpmlnuinWbX0Sc2kjxuCaOvxK4siXXcFI0s1xFywZaSs5J0cxyV6zmc1twUjSz3OU9+lxMTopmlqsIJ0Uzs42U0yo5Topmljv3KZqZpQJR59FnM7MNyqii6KRoZjnzQIuZWQNlVFV0UjSz3FVETVHSL2giv0fEBblEZGYVJYC6ugpIisCkJt4zM8smgEqoKUbELYWvJXVtsFaZmVkm5TRPsdnJQ5IOlzQDeDV9PUTSdblHZmaVIzJu7UCWGZU/A44F3gWIiBeBo3KMycwqSrbHm7aXwZhMo88RMS9d0bZes6vXmpmt105qgVlkSYrzJB0BhKSOwNeBV/INy8wqRkCU0ehzlubzOcB5JM9LfQs4MH1tZpaRMm6l12xNMSIWA19sg1jMrFKVUfM5y+jz7pL+IGmRpIWS7pe0e1sEZ2YVosJGn+8A7gL6AjsDdwN35hmUmVWQ+snbWbZ2IEtS7BoRv46Idel2G9Al78DMrHJEZNvag80mRUm9JPUC/izpW5IGSNpN0jeBP7VdiGZW9uqUbWuGpPFpN97LBWW9JD0s6fX05/ZpuST9XNIsSdM
|
||
|
"text/plain": [
|
||
|
"<Figure size 432x288 with 2 Axes>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {
|
||
|
"needs_background": "light"
|
||
|
},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": []
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## Great job!"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"metadata": {
|
||
|
"kernelspec": {
|
||
|
"display_name": "Python 3",
|
||
|
"language": "python",
|
||
|
"name": "python3"
|
||
|
},
|
||
|
"language_info": {
|
||
|
"codemirror_mode": {
|
||
|
"name": "ipython",
|
||
|
"version": 3
|
||
|
},
|
||
|
"file_extension": ".py",
|
||
|
"mimetype": "text/x-python",
|
||
|
"name": "python",
|
||
|
"nbconvert_exporter": "python",
|
||
|
"pygments_lexer": "ipython3",
|
||
|
"version": "3.8.5"
|
||
|
}
|
||
|
},
|
||
|
"nbformat": 4,
|
||
|
"nbformat_minor": 2
|
||
|
}
|