You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

6143 lines
192 KiB

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"___\n",
"\n",
"<a href='http://www.pieriandata.com'><img src='../Pierian_Data_Logo.png'/></a>\n",
"___\n",
"<center><em>Copyright by Pierian Data Inc.</em></center>\n",
"<center><em>For more information, visit us at <a href='http://www.pieriandata.com'>www.pieriandata.com</a></em></center>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Useful Methods\n",
"\n",
"Let's cover some useful methods and functions built in to pandas. This is actually just a small sampling of the functions and methods available in Pandas, but they are some of the most commonly used.\n",
"The [documentation](https://pandas.pydata.org/pandas-docs/stable/reference/index.html) is a great resource to continue exploring more methods and functions (we will introduce more further along in the course).\n",
"Here is a list of functions and methods we'll cover here (click on one to jump to that section in this notebook.):\n",
"\n",
"* [apply() method](#apply_method)\n",
"* [apply() with a function](#apply_function)\n",
"* [apply() with a lambda expression](#apply_lambda)\n",
"* [apply() on multiple columns](#apply_multiple)\n",
"* [describe()](#describe)\n",
"* [sort_values()](#sort)\n",
"* [corr()](#corr)\n",
"* [idxmin and idxmax](#idx)\n",
"* [value_counts](#v_c)\n",
"* [replace](#replace)\n",
"* [unique and nunique](#uni)\n",
"* [map](#map)\n",
"* [duplicated and drop_duplicates](#dup)\n",
"* [between](#bet)\n",
"* [sample](#sample)\n",
"* [nlargest](#n)\n",
"\n",
"Make sure to view the video lessons to get the full explanation!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='apply_method'></a>\n",
"\n",
"## The .apply() method\n",
"\n",
"Here we will learn about a very useful method known as **apply** on a DataFrame. This allows us to apply and broadcast custom functions on a DataFrame column"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"df = pd.read_csv('tips.csv')"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>total_bill</th>\n",
" <th>tip</th>\n",
" <th>sex</th>\n",
" <th>smoker</th>\n",
" <th>day</th>\n",
" <th>time</th>\n",
" <th>size</th>\n",
" <th>price_per_person</th>\n",
" <th>Payer Name</th>\n",
" <th>CC Number</th>\n",
" <th>Payment ID</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>16.99</td>\n",
" <td>1.01</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>8.49</td>\n",
" <td>Christy Cunningham</td>\n",
" <td>3560325168603410</td>\n",
" <td>Sun2959</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>10.34</td>\n",
" <td>1.66</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>3.45</td>\n",
" <td>Douglas Tucker</td>\n",
" <td>4478071379779230</td>\n",
" <td>Sun4608</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>21.01</td>\n",
" <td>3.50</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>7.00</td>\n",
" <td>Travis Walters</td>\n",
" <td>6011812112971322</td>\n",
" <td>Sun4458</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>23.68</td>\n",
" <td>3.31</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>11.84</td>\n",
" <td>Nathaniel Harris</td>\n",
" <td>4676137647685994</td>\n",
" <td>Sun5260</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>24.59</td>\n",
" <td>3.61</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>4</td>\n",
" <td>6.15</td>\n",
" <td>Tonya Carter</td>\n",
" <td>4832732618637221</td>\n",
" <td>Sun2251</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" total_bill tip sex smoker day time size price_per_person \\\n",
"0 16.99 1.01 Female No Sun Dinner 2 8.49 \n",
"1 10.34 1.66 Male No Sun Dinner 3 3.45 \n",
"2 21.01 3.50 Male No Sun Dinner 3 7.00 \n",
"3 23.68 3.31 Male No Sun Dinner 2 11.84 \n",
"4 24.59 3.61 Female No Sun Dinner 4 6.15 \n",
"\n",
" Payer Name CC Number Payment ID \n",
"0 Christy Cunningham 3560325168603410 Sun2959 \n",
"1 Douglas Tucker 4478071379779230 Sun4608 \n",
"2 Travis Walters 6011812112971322 Sun4458 \n",
"3 Nathaniel Harris 4676137647685994 Sun5260 \n",
"4 Tonya Carter 4832732618637221 Sun2251 "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='apply_function'></a>\n",
"### apply with a function"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 244 entries, 0 to 243\n",
"Data columns (total 11 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 total_bill 244 non-null float64\n",
" 1 tip 244 non-null float64\n",
" 2 sex 244 non-null object \n",
" 3 smoker 244 non-null object \n",
" 4 day 244 non-null object \n",
" 5 time 244 non-null object \n",
" 6 size 244 non-null int64 \n",
" 7 price_per_person 244 non-null float64\n",
" 8 Payer Name 244 non-null object \n",
" 9 CC Number 244 non-null int64 \n",
" 10 Payment ID 244 non-null object \n",
"dtypes: float64(3), int64(2), object(6)\n",
"memory usage: 21.1+ KB\n"
]
}
],
"source": [
"df.info()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def last_four(num):\n",
" return str(num)[-4:]"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"3560325168603410"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['CC Number'][0]"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'3410'"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"last_four(3560325168603410)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"df['last_four'] = df['CC Number'].apply(last_four)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>total_bill</th>\n",
" <th>tip</th>\n",
" <th>sex</th>\n",
" <th>smoker</th>\n",
" <th>day</th>\n",
" <th>time</th>\n",
" <th>size</th>\n",
" <th>price_per_person</th>\n",
" <th>Payer Name</th>\n",
" <th>CC Number</th>\n",
" <th>Payment ID</th>\n",
" <th>last_four</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>16.99</td>\n",
" <td>1.01</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>8.49</td>\n",
" <td>Christy Cunningham</td>\n",
" <td>3560325168603410</td>\n",
" <td>Sun2959</td>\n",
" <td>3410</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>10.34</td>\n",
" <td>1.66</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>3.45</td>\n",
" <td>Douglas Tucker</td>\n",
" <td>4478071379779230</td>\n",
" <td>Sun4608</td>\n",
" <td>9230</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>21.01</td>\n",
" <td>3.50</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>7.00</td>\n",
" <td>Travis Walters</td>\n",
" <td>6011812112971322</td>\n",
" <td>Sun4458</td>\n",
" <td>1322</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>23.68</td>\n",
" <td>3.31</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>11.84</td>\n",
" <td>Nathaniel Harris</td>\n",
" <td>4676137647685994</td>\n",
" <td>Sun5260</td>\n",
" <td>5994</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>24.59</td>\n",
" <td>3.61</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>4</td>\n",
" <td>6.15</td>\n",
" <td>Tonya Carter</td>\n",
" <td>4832732618637221</td>\n",
" <td>Sun2251</td>\n",
" <td>7221</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" total_bill tip sex smoker day time size price_per_person \\\n",
"0 16.99 1.01 Female No Sun Dinner 2 8.49 \n",
"1 10.34 1.66 Male No Sun Dinner 3 3.45 \n",
"2 21.01 3.50 Male No Sun Dinner 3 7.00 \n",
"3 23.68 3.31 Male No Sun Dinner 2 11.84 \n",
"4 24.59 3.61 Female No Sun Dinner 4 6.15 \n",
"\n",
" Payer Name CC Number Payment ID last_four \n",
"0 Christy Cunningham 3560325168603410 Sun2959 3410 \n",
"1 Douglas Tucker 4478071379779230 Sun4608 9230 \n",
"2 Travis Walters 6011812112971322 Sun4458 1322 \n",
"3 Nathaniel Harris 4676137647685994 Sun5260 5994 \n",
"4 Tonya Carter 4832732618637221 Sun2251 7221 "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Using .apply() with more complex functions"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"19.78594262295082"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['total_bill'].mean()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def yelp(price):\n",
" if price < 10:\n",
" return '$'\n",
" elif price >= 10 and price < 30:\n",
" return '$$'\n",
" else:\n",
" return '$$$'"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"df['Expensive'] = df['total_bill'].apply(yelp)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='apply_lambda'></a>\n",
"### apply with lambda"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def simple(num):\n",
" return num*2"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<function __main__.<lambda>(num)>"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"lambda num: num*2"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 3.0582\n",
"1 1.8612\n",
"2 3.7818\n",
"3 4.2624\n",
"4 4.4262\n",
" ... \n",
"239 5.2254\n",
"240 4.8924\n",
"241 4.0806\n",
"242 3.2076\n",
"243 3.3804\n",
"Name: total_bill, Length: 244, dtype: float64"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['total_bill'].apply(lambda bill:bill*0.18)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='apply_multiple'></a>\n",
"## apply that uses multiple columns\n",
"\n",
"Note, there are several ways to do this:\n",
"\n",
"https://stackoverflow.com/questions/19914937/applying-function-with-multiple-arguments-to-create-a-new-pandas-column"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>total_bill</th>\n",
" <th>tip</th>\n",
" <th>sex</th>\n",
" <th>smoker</th>\n",
" <th>day</th>\n",
" <th>time</th>\n",
" <th>size</th>\n",
" <th>price_per_person</th>\n",
" <th>Payer Name</th>\n",
" <th>CC Number</th>\n",
" <th>Payment ID</th>\n",
" <th>last_four</th>\n",
" <th>Expensive</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>16.99</td>\n",
" <td>1.01</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>8.49</td>\n",
" <td>Christy Cunningham</td>\n",
" <td>3560325168603410</td>\n",
" <td>Sun2959</td>\n",
" <td>3410</td>\n",
" <td>$$</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>10.34</td>\n",
" <td>1.66</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>3.45</td>\n",
" <td>Douglas Tucker</td>\n",
" <td>4478071379779230</td>\n",
" <td>Sun4608</td>\n",
" <td>9230</td>\n",
" <td>$$</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>21.01</td>\n",
" <td>3.50</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>7.00</td>\n",
" <td>Travis Walters</td>\n",
" <td>6011812112971322</td>\n",
" <td>Sun4458</td>\n",
" <td>1322</td>\n",
" <td>$$</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>23.68</td>\n",
" <td>3.31</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>11.84</td>\n",
" <td>Nathaniel Harris</td>\n",
" <td>4676137647685994</td>\n",
" <td>Sun5260</td>\n",
" <td>5994</td>\n",
" <td>$$</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>24.59</td>\n",
" <td>3.61</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>4</td>\n",
" <td>6.15</td>\n",
" <td>Tonya Carter</td>\n",
" <td>4832732618637221</td>\n",
" <td>Sun2251</td>\n",
" <td>7221</td>\n",
" <td>$$</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" total_bill tip sex smoker day time size price_per_person \\\n",
"0 16.99 1.01 Female No Sun Dinner 2 8.49 \n",
"1 10.34 1.66 Male No Sun Dinner 3 3.45 \n",
"2 21.01 3.50 Male No Sun Dinner 3 7.00 \n",
"3 23.68 3.31 Male No Sun Dinner 2 11.84 \n",
"4 24.59 3.61 Female No Sun Dinner 4 6.15 \n",
"\n",
" Payer Name CC Number Payment ID last_four Expensive \n",
"0 Christy Cunningham 3560325168603410 Sun2959 3410 $$ \n",
"1 Douglas Tucker 4478071379779230 Sun4608 9230 $$ \n",
"2 Travis Walters 6011812112971322 Sun4458 1322 $$ \n",
"3 Nathaniel Harris 4676137647685994 Sun5260 5994 $$ \n",
"4 Tonya Carter 4832732618637221 Sun2251 7221 $$ "
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def quality(total_bill,tip):\n",
" if tip/total_bill > 0.25:\n",
" return \"Generous\"\n",
" else:\n",
" return \"Other\""
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"df['Tip Quality'] = df[['total_bill','tip']].apply(lambda df: quality(df['total_bill'],df['tip']),axis=1)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>total_bill</th>\n",
" <th>tip</th>\n",
" <th>sex</th>\n",
" <th>smoker</th>\n",
" <th>day</th>\n",
" <th>time</th>\n",
" <th>size</th>\n",
" <th>price_per_person</th>\n",
" <th>Payer Name</th>\n",
" <th>CC Number</th>\n",
" <th>Payment ID</th>\n",
" <th>last_four</th>\n",
" <th>Expensive</th>\n",
" <th>Tip Quality</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>16.99</td>\n",
" <td>1.01</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>8.49</td>\n",
" <td>Christy Cunningham</td>\n",
" <td>3560325168603410</td>\n",
" <td>Sun2959</td>\n",
" <td>3410</td>\n",
" <td>$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>10.34</td>\n",
" <td>1.66</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>3.45</td>\n",
" <td>Douglas Tucker</td>\n",
" <td>4478071379779230</td>\n",
" <td>Sun4608</td>\n",
" <td>9230</td>\n",
" <td>$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>21.01</td>\n",
" <td>3.50</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>7.00</td>\n",
" <td>Travis Walters</td>\n",
" <td>6011812112971322</td>\n",
" <td>Sun4458</td>\n",
" <td>1322</td>\n",
" <td>$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>23.68</td>\n",
" <td>3.31</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>11.84</td>\n",
" <td>Nathaniel Harris</td>\n",
" <td>4676137647685994</td>\n",
" <td>Sun5260</td>\n",
" <td>5994</td>\n",
" <td>$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>24.59</td>\n",
" <td>3.61</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>4</td>\n",
" <td>6.15</td>\n",
" <td>Tonya Carter</td>\n",
" <td>4832732618637221</td>\n",
" <td>Sun2251</td>\n",
" <td>7221</td>\n",
" <td>$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" total_bill tip sex smoker day time size price_per_person \\\n",
"0 16.99 1.01 Female No Sun Dinner 2 8.49 \n",
"1 10.34 1.66 Male No Sun Dinner 3 3.45 \n",
"2 21.01 3.50 Male No Sun Dinner 3 7.00 \n",
"3 23.68 3.31 Male No Sun Dinner 2 11.84 \n",
"4 24.59 3.61 Female No Sun Dinner 4 6.15 \n",
"\n",
" Payer Name CC Number Payment ID last_four Expensive \\\n",
"0 Christy Cunningham 3560325168603410 Sun2959 3410 $$ \n",
"1 Douglas Tucker 4478071379779230 Sun4608 9230 $$ \n",
"2 Travis Walters 6011812112971322 Sun4458 1322 $$ \n",
"3 Nathaniel Harris 4676137647685994 Sun5260 5994 $$ \n",
"4 Tonya Carter 4832732618637221 Sun2251 7221 $$ \n",
"\n",
" Tip Quality \n",
"0 Other \n",
"1 Other \n",
"2 Other \n",
"3 Other \n",
"4 Other "
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"df['Tip Quality'] = np.vectorize(quality)(df['total_bill'], df['tip'])"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>total_bill</th>\n",
" <th>tip</th>\n",
" <th>sex</th>\n",
" <th>smoker</th>\n",
" <th>day</th>\n",
" <th>time</th>\n",
" <th>size</th>\n",
" <th>price_per_person</th>\n",
" <th>Payer Name</th>\n",
" <th>CC Number</th>\n",
" <th>Payment ID</th>\n",
" <th>last_four</th>\n",
" <th>Expensive</th>\n",
" <th>Tip Quality</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>16.99</td>\n",
" <td>1.01</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>8.49</td>\n",
" <td>Christy Cunningham</td>\n",
" <td>3560325168603410</td>\n",
" <td>Sun2959</td>\n",
" <td>3410</td>\n",
" <td>$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>10.34</td>\n",
" <td>1.66</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>3.45</td>\n",
" <td>Douglas Tucker</td>\n",
" <td>4478071379779230</td>\n",
" <td>Sun4608</td>\n",
" <td>9230</td>\n",
" <td>$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>21.01</td>\n",
" <td>3.50</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>7.00</td>\n",
" <td>Travis Walters</td>\n",
" <td>6011812112971322</td>\n",
" <td>Sun4458</td>\n",
" <td>1322</td>\n",
" <td>$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>23.68</td>\n",
" <td>3.31</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>11.84</td>\n",
" <td>Nathaniel Harris</td>\n",
" <td>4676137647685994</td>\n",
" <td>Sun5260</td>\n",
" <td>5994</td>\n",
" <td>$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>24.59</td>\n",
" <td>3.61</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>4</td>\n",
" <td>6.15</td>\n",
" <td>Tonya Carter</td>\n",
" <td>4832732618637221</td>\n",
" <td>Sun2251</td>\n",
" <td>7221</td>\n",
" <td>$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" total_bill tip sex smoker day time size price_per_person \\\n",
"0 16.99 1.01 Female No Sun Dinner 2 8.49 \n",
"1 10.34 1.66 Male No Sun Dinner 3 3.45 \n",
"2 21.01 3.50 Male No Sun Dinner 3 7.00 \n",
"3 23.68 3.31 Male No Sun Dinner 2 11.84 \n",
"4 24.59 3.61 Female No Sun Dinner 4 6.15 \n",
"\n",
" Payer Name CC Number Payment ID last_four Expensive \\\n",
"0 Christy Cunningham 3560325168603410 Sun2959 3410 $$ \n",
"1 Douglas Tucker 4478071379779230 Sun4608 9230 $$ \n",
"2 Travis Walters 6011812112971322 Sun4458 1322 $$ \n",
"3 Nathaniel Harris 4676137647685994 Sun5260 5994 $$ \n",
"4 Tonya Carter 4832732618637221 Sun2251 7221 $$ \n",
"\n",
" Tip Quality \n",
"0 Other \n",
"1 Other \n",
"2 Other \n",
"3 Other \n",
"4 Other "
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So, which one is faster?"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import timeit \n",
" \n",
"# code snippet to be executed only once \n",
"setup = '''\n",
"import numpy as np\n",
"import pandas as pd\n",
"df = pd.read_csv('tips.csv')\n",
"def quality(total_bill,tip):\n",
" if tip/total_bill > 0.25:\n",
" return \"Generous\"\n",
" else:\n",
" return \"Other\"\n",
"'''\n",
" \n",
"# code snippet whose execution time is to be measured \n",
"stmt_one = ''' \n",
"df['Tip Quality'] = df[['total_bill','tip']].apply(lambda df: quality(df['total_bill'],df['tip']),axis=1)\n",
"'''\n",
"\n",
"stmt_two = '''\n",
"df['Tip Quality'] = np.vectorize(quality)(df['total_bill'], df['tip'])\n",
"'''\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"5.0198852999999986"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"timeit.timeit(setup = setup, \n",
" stmt = stmt_one, \n",
" number = 1000) "
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.21840849999999534"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"timeit.timeit(setup = setup, \n",
" stmt = stmt_two, \n",
" number = 1000) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Wow! Vectorization is much faster! Keep **np.vectorize()** in mind for the future.\n",
"\n",
"Full Details:\n",
"https://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='describe'></a>\n",
"### df.describe for statistical summaries"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>total_bill</th>\n",
" <th>tip</th>\n",
" <th>size</th>\n",
" <th>price_per_person</th>\n",
" <th>CC Number</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>244.000000</td>\n",
" <td>244.000000</td>\n",
" <td>244.000000</td>\n",
" <td>244.000000</td>\n",
" <td>2.440000e+02</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>19.785943</td>\n",
" <td>2.998279</td>\n",
" <td>2.569672</td>\n",
" <td>7.888197</td>\n",
" <td>2.563496e+15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>8.902412</td>\n",
" <td>1.383638</td>\n",
" <td>0.951100</td>\n",
" <td>2.914234</td>\n",
" <td>2.369340e+15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>3.070000</td>\n",
" <td>1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>2.880000</td>\n",
" <td>6.040679e+10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>13.347500</td>\n",
" <td>2.000000</td>\n",
" <td>2.000000</td>\n",
" <td>5.800000</td>\n",
" <td>3.040731e+13</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>17.795000</td>\n",
" <td>2.900000</td>\n",
" <td>2.000000</td>\n",
" <td>7.255000</td>\n",
" <td>3.525318e+15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>24.127500</td>\n",
" <td>3.562500</td>\n",
" <td>3.000000</td>\n",
" <td>9.390000</td>\n",
" <td>4.553675e+15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>50.810000</td>\n",
" <td>10.000000</td>\n",
" <td>6.000000</td>\n",
" <td>20.270000</td>\n",
" <td>6.596454e+15</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" total_bill tip size price_per_person CC Number\n",
"count 244.000000 244.000000 244.000000 244.000000 2.440000e+02\n",
"mean 19.785943 2.998279 2.569672 7.888197 2.563496e+15\n",
"std 8.902412 1.383638 0.951100 2.914234 2.369340e+15\n",
"min 3.070000 1.000000 1.000000 2.880000 6.040679e+10\n",
"25% 13.347500 2.000000 2.000000 5.800000 3.040731e+13\n",
"50% 17.795000 2.900000 2.000000 7.255000 3.525318e+15\n",
"75% 24.127500 3.562500 3.000000 9.390000 4.553675e+15\n",
"max 50.810000 10.000000 6.000000 20.270000 6.596454e+15"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.describe()"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>count</th>\n",
" <th>mean</th>\n",
" <th>std</th>\n",
" <th>min</th>\n",
" <th>25%</th>\n",
" <th>50%</th>\n",
" <th>75%</th>\n",
" <th>max</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>total_bill</th>\n",
" <td>244.0</td>\n",
" <td>1.978594e+01</td>\n",
" <td>8.902412e+00</td>\n",
" <td>3.070000e+00</td>\n",
" <td>1.334750e+01</td>\n",
" <td>1.779500e+01</td>\n",
" <td>2.412750e+01</td>\n",
" <td>5.081000e+01</td>\n",
" </tr>\n",
" <tr>\n",
" <th>tip</th>\n",
" <td>244.0</td>\n",
" <td>2.998279e+00</td>\n",
" <td>1.383638e+00</td>\n",
" <td>1.000000e+00</td>\n",
" <td>2.000000e+00</td>\n",
" <td>2.900000e+00</td>\n",
" <td>3.562500e+00</td>\n",
" <td>1.000000e+01</td>\n",
" </tr>\n",
" <tr>\n",
" <th>size</th>\n",
" <td>244.0</td>\n",
" <td>2.569672e+00</td>\n",
" <td>9.510998e-01</td>\n",
" <td>1.000000e+00</td>\n",
" <td>2.000000e+00</td>\n",
" <td>2.000000e+00</td>\n",
" <td>3.000000e+00</td>\n",
" <td>6.000000e+00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>price_per_person</th>\n",
" <td>244.0</td>\n",
" <td>7.888197e+00</td>\n",
" <td>2.914234e+00</td>\n",
" <td>2.880000e+00</td>\n",
" <td>5.800000e+00</td>\n",
" <td>7.255000e+00</td>\n",
" <td>9.390000e+00</td>\n",
" <td>2.027000e+01</td>\n",
" </tr>\n",
" <tr>\n",
" <th>CC Number</th>\n",
" <td>244.0</td>\n",
" <td>2.563496e+15</td>\n",
" <td>2.369340e+15</td>\n",
" <td>6.040679e+10</td>\n",
" <td>3.040731e+13</td>\n",
" <td>3.525318e+15</td>\n",
" <td>4.553675e+15</td>\n",
" <td>6.596454e+15</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" count mean std min \\\n",
"total_bill 244.0 1.978594e+01 8.902412e+00 3.070000e+00 \n",
"tip 244.0 2.998279e+00 1.383638e+00 1.000000e+00 \n",
"size 244.0 2.569672e+00 9.510998e-01 1.000000e+00 \n",
"price_per_person 244.0 7.888197e+00 2.914234e+00 2.880000e+00 \n",
"CC Number 244.0 2.563496e+15 2.369340e+15 6.040679e+10 \n",
"\n",
" 25% 50% 75% max \n",
"total_bill 1.334750e+01 1.779500e+01 2.412750e+01 5.081000e+01 \n",
"tip 2.000000e+00 2.900000e+00 3.562500e+00 1.000000e+01 \n",
"size 2.000000e+00 2.000000e+00 3.000000e+00 6.000000e+00 \n",
"price_per_person 5.800000e+00 7.255000e+00 9.390000e+00 2.027000e+01 \n",
"CC Number 3.040731e+13 3.525318e+15 4.553675e+15 6.596454e+15 "
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.describe().transpose()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='sort'></a>\n",
"### sort_values()"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>total_bill</th>\n",
" <th>tip</th>\n",
" <th>sex</th>\n",
" <th>smoker</th>\n",
" <th>day</th>\n",
" <th>time</th>\n",
" <th>size</th>\n",
" <th>price_per_person</th>\n",
" <th>Payer Name</th>\n",
" <th>CC Number</th>\n",
" <th>Payment ID</th>\n",
" <th>last_four</th>\n",
" <th>Expensive</th>\n",
" <th>Tip Quality</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>67</th>\n",
" <td>3.07</td>\n",
" <td>1.00</td>\n",
" <td>Female</td>\n",
" <td>Yes</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>1</td>\n",
" <td>3.07</td>\n",
" <td>Tiffany Brock</td>\n",
" <td>4359488526995267</td>\n",
" <td>Sat3455</td>\n",
" <td>5267</td>\n",
" <td>$</td>\n",
" <td>Generous</td>\n",
" </tr>\n",
" <tr>\n",
" <th>236</th>\n",
" <td>12.60</td>\n",
" <td>1.00</td>\n",
" <td>Male</td>\n",
" <td>Yes</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>6.30</td>\n",
" <td>Matthew Myers</td>\n",
" <td>3543676378973965</td>\n",
" <td>Sat5032</td>\n",
" <td>3965</td>\n",
" <td>$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>92</th>\n",
" <td>5.75</td>\n",
" <td>1.00</td>\n",
" <td>Female</td>\n",
" <td>Yes</td>\n",
" <td>Fri</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>2.88</td>\n",
" <td>Leah Ramirez</td>\n",
" <td>3508911676966392</td>\n",
" <td>Fri3780</td>\n",
" <td>6392</td>\n",
" <td>$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>111</th>\n",
" <td>7.25</td>\n",
" <td>1.00</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>1</td>\n",
" <td>7.25</td>\n",
" <td>Terri Jones</td>\n",
" <td>3559221007826887</td>\n",
" <td>Sat4801</td>\n",
" <td>6887</td>\n",
" <td>$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>16.99</td>\n",
" <td>1.01</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>8.49</td>\n",
" <td>Christy Cunningham</td>\n",
" <td>3560325168603410</td>\n",
" <td>Sun2959</td>\n",
" <td>3410</td>\n",
" <td>$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>141</th>\n",
" <td>34.30</td>\n",
" <td>6.70</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Thur</td>\n",
" <td>Lunch</td>\n",
" <td>6</td>\n",
" <td>5.72</td>\n",
" <td>Steven Carlson</td>\n",
" <td>3526515703718508</td>\n",
" <td>Thur1025</td>\n",
" <td>8508</td>\n",
" <td>$$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>59</th>\n",
" <td>48.27</td>\n",
" <td>6.73</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>4</td>\n",
" <td>12.07</td>\n",
" <td>Brian Ortiz</td>\n",
" <td>6596453823950595</td>\n",
" <td>Sat8139</td>\n",
" <td>0595</td>\n",
" <td>$$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>39.42</td>\n",
" <td>7.58</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>4</td>\n",
" <td>9.86</td>\n",
" <td>Lance Peterson</td>\n",
" <td>3542584061609808</td>\n",
" <td>Sat239</td>\n",
" <td>9808</td>\n",
" <td>$$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>212</th>\n",
" <td>48.33</td>\n",
" <td>9.00</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>4</td>\n",
" <td>12.08</td>\n",
" <td>Alex Williamson</td>\n",
" <td>676218815212</td>\n",
" <td>Sat4590</td>\n",
" <td>5212</td>\n",
" <td>$$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>170</th>\n",
" <td>50.81</td>\n",
" <td>10.00</td>\n",
" <td>Male</td>\n",
" <td>Yes</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>16.94</td>\n",
" <td>Gregory Clark</td>\n",
" <td>5473850968388236</td>\n",
" <td>Sat1954</td>\n",
" <td>8236</td>\n",
" <td>$$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>244 rows × 14 columns</p>\n",
"</div>"
],
"text/plain": [
" total_bill tip sex smoker day time size price_per_person \\\n",
"67 3.07 1.00 Female Yes Sat Dinner 1 3.07 \n",
"236 12.60 1.00 Male Yes Sat Dinner 2 6.30 \n",
"92 5.75 1.00 Female Yes Fri Dinner 2 2.88 \n",
"111 7.25 1.00 Female No Sat Dinner 1 7.25 \n",
"0 16.99 1.01 Female No Sun Dinner 2 8.49 \n",
".. ... ... ... ... ... ... ... ... \n",
"141 34.30 6.70 Male No Thur Lunch 6 5.72 \n",
"59 48.27 6.73 Male No Sat Dinner 4 12.07 \n",
"23 39.42 7.58 Male No Sat Dinner 4 9.86 \n",
"212 48.33 9.00 Male No Sat Dinner 4 12.08 \n",
"170 50.81 10.00 Male Yes Sat Dinner 3 16.94 \n",
"\n",
" Payer Name CC Number Payment ID last_four Expensive \\\n",
"67 Tiffany Brock 4359488526995267 Sat3455 5267 $ \n",
"236 Matthew Myers 3543676378973965 Sat5032 3965 $$ \n",
"92 Leah Ramirez 3508911676966392 Fri3780 6392 $ \n",
"111 Terri Jones 3559221007826887 Sat4801 6887 $ \n",
"0 Christy Cunningham 3560325168603410 Sun2959 3410 $$ \n",
".. ... ... ... ... ... \n",
"141 Steven Carlson 3526515703718508 Thur1025 8508 $$$ \n",
"59 Brian Ortiz 6596453823950595 Sat8139 0595 $$$ \n",
"23 Lance Peterson 3542584061609808 Sat239 9808 $$$ \n",
"212 Alex Williamson 676218815212 Sat4590 5212 $$$ \n",
"170 Gregory Clark 5473850968388236 Sat1954 8236 $$$ \n",
"\n",
" Tip Quality \n",
"67 Generous \n",
"236 Other \n",
"92 Other \n",
"111 Other \n",
"0 Other \n",
".. ... \n",
"141 Other \n",
"59 Other \n",
"23 Other \n",
"212 Other \n",
"170 Other \n",
"\n",
"[244 rows x 14 columns]"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.sort_values('tip')"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>total_bill</th>\n",
" <th>tip</th>\n",
" <th>sex</th>\n",
" <th>smoker</th>\n",
" <th>day</th>\n",
" <th>time</th>\n",
" <th>size</th>\n",
" <th>price_per_person</th>\n",
" <th>Payer Name</th>\n",
" <th>CC Number</th>\n",
" <th>Payment ID</th>\n",
" <th>last_four</th>\n",
" <th>Expensive</th>\n",
" <th>Tip Quality</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>67</th>\n",
" <td>3.07</td>\n",
" <td>1.00</td>\n",
" <td>Female</td>\n",
" <td>Yes</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>1</td>\n",
" <td>3.07</td>\n",
" <td>Tiffany Brock</td>\n",
" <td>4359488526995267</td>\n",
" <td>Sat3455</td>\n",
" <td>5267</td>\n",
" <td>$</td>\n",
" <td>Generous</td>\n",
" </tr>\n",
" <tr>\n",
" <th>111</th>\n",
" <td>7.25</td>\n",
" <td>1.00</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>1</td>\n",
" <td>7.25</td>\n",
" <td>Terri Jones</td>\n",
" <td>3559221007826887</td>\n",
" <td>Sat4801</td>\n",
" <td>6887</td>\n",
" <td>$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>92</th>\n",
" <td>5.75</td>\n",
" <td>1.00</td>\n",
" <td>Female</td>\n",
" <td>Yes</td>\n",
" <td>Fri</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>2.88</td>\n",
" <td>Leah Ramirez</td>\n",
" <td>3508911676966392</td>\n",
" <td>Fri3780</td>\n",
" <td>6392</td>\n",
" <td>$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>236</th>\n",
" <td>12.60</td>\n",
" <td>1.00</td>\n",
" <td>Male</td>\n",
" <td>Yes</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>6.30</td>\n",
" <td>Matthew Myers</td>\n",
" <td>3543676378973965</td>\n",
" <td>Sat5032</td>\n",
" <td>3965</td>\n",
" <td>$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>16.99</td>\n",
" <td>1.01</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>8.49</td>\n",
" <td>Christy Cunningham</td>\n",
" <td>3560325168603410</td>\n",
" <td>Sun2959</td>\n",
" <td>3410</td>\n",
" <td>$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>141</th>\n",
" <td>34.30</td>\n",
" <td>6.70</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Thur</td>\n",
" <td>Lunch</td>\n",
" <td>6</td>\n",
" <td>5.72</td>\n",
" <td>Steven Carlson</td>\n",
" <td>3526515703718508</td>\n",
" <td>Thur1025</td>\n",
" <td>8508</td>\n",
" <td>$$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>59</th>\n",
" <td>48.27</td>\n",
" <td>6.73</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>4</td>\n",
" <td>12.07</td>\n",
" <td>Brian Ortiz</td>\n",
" <td>6596453823950595</td>\n",
" <td>Sat8139</td>\n",
" <td>0595</td>\n",
" <td>$$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>39.42</td>\n",
" <td>7.58</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>4</td>\n",
" <td>9.86</td>\n",
" <td>Lance Peterson</td>\n",
" <td>3542584061609808</td>\n",
" <td>Sat239</td>\n",
" <td>9808</td>\n",
" <td>$$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>212</th>\n",
" <td>48.33</td>\n",
" <td>9.00</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>4</td>\n",
" <td>12.08</td>\n",
" <td>Alex Williamson</td>\n",
" <td>676218815212</td>\n",
" <td>Sat4590</td>\n",
" <td>5212</td>\n",
" <td>$$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>170</th>\n",
" <td>50.81</td>\n",
" <td>10.00</td>\n",
" <td>Male</td>\n",
" <td>Yes</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>16.94</td>\n",
" <td>Gregory Clark</td>\n",
" <td>5473850968388236</td>\n",
" <td>Sat1954</td>\n",
" <td>8236</td>\n",
" <td>$$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>244 rows × 14 columns</p>\n",
"</div>"
],
"text/plain": [
" total_bill tip sex smoker day time size price_per_person \\\n",
"67 3.07 1.00 Female Yes Sat Dinner 1 3.07 \n",
"111 7.25 1.00 Female No Sat Dinner 1 7.25 \n",
"92 5.75 1.00 Female Yes Fri Dinner 2 2.88 \n",
"236 12.60 1.00 Male Yes Sat Dinner 2 6.30 \n",
"0 16.99 1.01 Female No Sun Dinner 2 8.49 \n",
".. ... ... ... ... ... ... ... ... \n",
"141 34.30 6.70 Male No Thur Lunch 6 5.72 \n",
"59 48.27 6.73 Male No Sat Dinner 4 12.07 \n",
"23 39.42 7.58 Male No Sat Dinner 4 9.86 \n",
"212 48.33 9.00 Male No Sat Dinner 4 12.08 \n",
"170 50.81 10.00 Male Yes Sat Dinner 3 16.94 \n",
"\n",
" Payer Name CC Number Payment ID last_four Expensive \\\n",
"67 Tiffany Brock 4359488526995267 Sat3455 5267 $ \n",
"111 Terri Jones 3559221007826887 Sat4801 6887 $ \n",
"92 Leah Ramirez 3508911676966392 Fri3780 6392 $ \n",
"236 Matthew Myers 3543676378973965 Sat5032 3965 $$ \n",
"0 Christy Cunningham 3560325168603410 Sun2959 3410 $$ \n",
".. ... ... ... ... ... \n",
"141 Steven Carlson 3526515703718508 Thur1025 8508 $$$ \n",
"59 Brian Ortiz 6596453823950595 Sat8139 0595 $$$ \n",
"23 Lance Peterson 3542584061609808 Sat239 9808 $$$ \n",
"212 Alex Williamson 676218815212 Sat4590 5212 $$$ \n",
"170 Gregory Clark 5473850968388236 Sat1954 8236 $$$ \n",
"\n",
" Tip Quality \n",
"67 Generous \n",
"111 Other \n",
"92 Other \n",
"236 Other \n",
"0 Other \n",
".. ... \n",
"141 Other \n",
"59 Other \n",
"23 Other \n",
"212 Other \n",
"170 Other \n",
"\n",
"[244 rows x 14 columns]"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Helpful if you want to reorder after a sort\n",
"# https://stackoverflow.com/questions/13148429/how-to-change-the-order-of-dataframe-columns\n",
"df.sort_values(['tip','size'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='corr'></a>\n",
"## df.corr() for correlation checks\n",
"\n",
"[Wikipedia on Correlation](https://en.wikipedia.org/wiki/Correlation_and_dependence)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>total_bill</th>\n",
" <th>tip</th>\n",
" <th>size</th>\n",
" <th>price_per_person</th>\n",
" <th>CC Number</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>total_bill</th>\n",
" <td>1.000000</td>\n",
" <td>0.675734</td>\n",
" <td>0.598315</td>\n",
" <td>0.647554</td>\n",
" <td>0.104576</td>\n",
" </tr>\n",
" <tr>\n",
" <th>tip</th>\n",
" <td>0.675734</td>\n",
" <td>1.000000</td>\n",
" <td>0.489299</td>\n",
" <td>0.347405</td>\n",
" <td>0.110857</td>\n",
" </tr>\n",
" <tr>\n",
" <th>size</th>\n",
" <td>0.598315</td>\n",
" <td>0.489299</td>\n",
" <td>1.000000</td>\n",
" <td>-0.175359</td>\n",
" <td>-0.030239</td>\n",
" </tr>\n",
" <tr>\n",
" <th>price_per_person</th>\n",
" <td>0.647554</td>\n",
" <td>0.347405</td>\n",
" <td>-0.175359</td>\n",
" <td>1.000000</td>\n",
" <td>0.135240</td>\n",
" </tr>\n",
" <tr>\n",
" <th>CC Number</th>\n",
" <td>0.104576</td>\n",
" <td>0.110857</td>\n",
" <td>-0.030239</td>\n",
" <td>0.135240</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" total_bill tip size price_per_person CC Number\n",
"total_bill 1.000000 0.675734 0.598315 0.647554 0.104576\n",
"tip 0.675734 1.000000 0.489299 0.347405 0.110857\n",
"size 0.598315 0.489299 1.000000 -0.175359 -0.030239\n",
"price_per_person 0.647554 0.347405 -0.175359 1.000000 0.135240\n",
"CC Number 0.104576 0.110857 -0.030239 0.135240 1.000000"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.corr()"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>total_bill</th>\n",
" <th>tip</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>total_bill</th>\n",
" <td>1.000000</td>\n",
" <td>0.675734</td>\n",
" </tr>\n",
" <tr>\n",
" <th>tip</th>\n",
" <td>0.675734</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" total_bill tip\n",
"total_bill 1.000000 0.675734\n",
"tip 0.675734 1.000000"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[['total_bill','tip']].corr()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='idx'></a>\n",
"### idxmin and idxmax"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>total_bill</th>\n",
" <th>tip</th>\n",
" <th>sex</th>\n",
" <th>smoker</th>\n",
" <th>day</th>\n",
" <th>time</th>\n",
" <th>size</th>\n",
" <th>price_per_person</th>\n",
" <th>Payer Name</th>\n",
" <th>CC Number</th>\n",
" <th>Payment ID</th>\n",
" <th>last_four</th>\n",
" <th>Expensive</th>\n",
" <th>Tip Quality</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>16.99</td>\n",
" <td>1.01</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>8.49</td>\n",
" <td>Christy Cunningham</td>\n",
" <td>3560325168603410</td>\n",
" <td>Sun2959</td>\n",
" <td>3410</td>\n",
" <td>$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>10.34</td>\n",
" <td>1.66</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>3.45</td>\n",
" <td>Douglas Tucker</td>\n",
" <td>4478071379779230</td>\n",
" <td>Sun4608</td>\n",
" <td>9230</td>\n",
" <td>$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>21.01</td>\n",
" <td>3.50</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>7.00</td>\n",
" <td>Travis Walters</td>\n",
" <td>6011812112971322</td>\n",
" <td>Sun4458</td>\n",
" <td>1322</td>\n",
" <td>$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>23.68</td>\n",
" <td>3.31</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>11.84</td>\n",
" <td>Nathaniel Harris</td>\n",
" <td>4676137647685994</td>\n",
" <td>Sun5260</td>\n",
" <td>5994</td>\n",
" <td>$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>24.59</td>\n",
" <td>3.61</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>4</td>\n",
" <td>6.15</td>\n",
" <td>Tonya Carter</td>\n",
" <td>4832732618637221</td>\n",
" <td>Sun2251</td>\n",
" <td>7221</td>\n",
" <td>$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" total_bill tip sex smoker day time size price_per_person \\\n",
"0 16.99 1.01 Female No Sun Dinner 2 8.49 \n",
"1 10.34 1.66 Male No Sun Dinner 3 3.45 \n",
"2 21.01 3.50 Male No Sun Dinner 3 7.00 \n",
"3 23.68 3.31 Male No Sun Dinner 2 11.84 \n",
"4 24.59 3.61 Female No Sun Dinner 4 6.15 \n",
"\n",
" Payer Name CC Number Payment ID last_four Expensive \\\n",
"0 Christy Cunningham 3560325168603410 Sun2959 3410 $$ \n",
"1 Douglas Tucker 4478071379779230 Sun4608 9230 $$ \n",
"2 Travis Walters 6011812112971322 Sun4458 1322 $$ \n",
"3 Nathaniel Harris 4676137647685994 Sun5260 5994 $$ \n",
"4 Tonya Carter 4832732618637221 Sun2251 7221 $$ \n",
"\n",
" Tip Quality \n",
"0 Other \n",
"1 Other \n",
"2 Other \n",
"3 Other \n",
"4 Other "
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"50.81"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['total_bill'].max()"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"170"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['total_bill'].idxmax()"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"67"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['total_bill'].idxmin()"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"total_bill 3.07\n",
"tip 1\n",
"sex Female\n",
"smoker Yes\n",
"day Sat\n",
"time Dinner\n",
"size 1\n",
"price_per_person 3.07\n",
"Payer Name Tiffany Brock\n",
"CC Number 4359488526995267\n",
"Payment ID Sat3455\n",
"last_four 5267\n",
"Expensive $\n",
"Tip Quality Generous\n",
"Name: 67, dtype: object"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.iloc[67]"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"total_bill 50.81\n",
"tip 10\n",
"sex Male\n",
"smoker Yes\n",
"day Sat\n",
"time Dinner\n",
"size 3\n",
"price_per_person 16.94\n",
"Payer Name Gregory Clark\n",
"CC Number 5473850968388236\n",
"Payment ID Sat1954\n",
"last_four 8236\n",
"Expensive $$$\n",
"Tip Quality Other\n",
"Name: 170, dtype: object"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.iloc[170]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='v_c'></a>\n",
"### value_counts\n",
"\n",
"Nice method to quickly get a count per category. Only makes sense on categorical columns."
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>total_bill</th>\n",
" <th>tip</th>\n",
" <th>sex</th>\n",
" <th>smoker</th>\n",
" <th>day</th>\n",
" <th>time</th>\n",
" <th>size</th>\n",
" <th>price_per_person</th>\n",
" <th>Payer Name</th>\n",
" <th>CC Number</th>\n",
" <th>Payment ID</th>\n",
" <th>last_four</th>\n",
" <th>Expensive</th>\n",
" <th>Tip Quality</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>16.99</td>\n",
" <td>1.01</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>8.49</td>\n",
" <td>Christy Cunningham</td>\n",
" <td>3560325168603410</td>\n",
" <td>Sun2959</td>\n",
" <td>3410</td>\n",
" <td>$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>10.34</td>\n",
" <td>1.66</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>3.45</td>\n",
" <td>Douglas Tucker</td>\n",
" <td>4478071379779230</td>\n",
" <td>Sun4608</td>\n",
" <td>9230</td>\n",
" <td>$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>21.01</td>\n",
" <td>3.50</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>7.00</td>\n",
" <td>Travis Walters</td>\n",
" <td>6011812112971322</td>\n",
" <td>Sun4458</td>\n",
" <td>1322</td>\n",
" <td>$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>23.68</td>\n",
" <td>3.31</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>11.84</td>\n",
" <td>Nathaniel Harris</td>\n",
" <td>4676137647685994</td>\n",
" <td>Sun5260</td>\n",
" <td>5994</td>\n",
" <td>$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>24.59</td>\n",
" <td>3.61</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>4</td>\n",
" <td>6.15</td>\n",
" <td>Tonya Carter</td>\n",
" <td>4832732618637221</td>\n",
" <td>Sun2251</td>\n",
" <td>7221</td>\n",
" <td>$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" total_bill tip sex smoker day time size price_per_person \\\n",
"0 16.99 1.01 Female No Sun Dinner 2 8.49 \n",
"1 10.34 1.66 Male No Sun Dinner 3 3.45 \n",
"2 21.01 3.50 Male No Sun Dinner 3 7.00 \n",
"3 23.68 3.31 Male No Sun Dinner 2 11.84 \n",
"4 24.59 3.61 Female No Sun Dinner 4 6.15 \n",
"\n",
" Payer Name CC Number Payment ID last_four Expensive \\\n",
"0 Christy Cunningham 3560325168603410 Sun2959 3410 $$ \n",
"1 Douglas Tucker 4478071379779230 Sun4608 9230 $$ \n",
"2 Travis Walters 6011812112971322 Sun4458 1322 $$ \n",
"3 Nathaniel Harris 4676137647685994 Sun5260 5994 $$ \n",
"4 Tonya Carter 4832732618637221 Sun2251 7221 $$ \n",
"\n",
" Tip Quality \n",
"0 Other \n",
"1 Other \n",
"2 Other \n",
"3 Other \n",
"4 Other "
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Male 157\n",
"Female 87\n",
"Name: sex, dtype: int64"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['sex'].value_counts()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='replace'></a>\n",
"\n",
"### replace\n",
"\n",
"Quickly replace values with another one."
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>total_bill</th>\n",
" <th>tip</th>\n",
" <th>sex</th>\n",
" <th>smoker</th>\n",
" <th>day</th>\n",
" <th>time</th>\n",
" <th>size</th>\n",
" <th>price_per_person</th>\n",
" <th>Payer Name</th>\n",
" <th>CC Number</th>\n",
" <th>Payment ID</th>\n",
" <th>last_four</th>\n",
" <th>Expensive</th>\n",
" <th>Tip Quality</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>16.99</td>\n",
" <td>1.01</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>8.49</td>\n",
" <td>Christy Cunningham</td>\n",
" <td>3560325168603410</td>\n",
" <td>Sun2959</td>\n",
" <td>3410</td>\n",
" <td>$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>10.34</td>\n",
" <td>1.66</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>3.45</td>\n",
" <td>Douglas Tucker</td>\n",
" <td>4478071379779230</td>\n",
" <td>Sun4608</td>\n",
" <td>9230</td>\n",
" <td>$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>21.01</td>\n",
" <td>3.50</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>7.00</td>\n",
" <td>Travis Walters</td>\n",
" <td>6011812112971322</td>\n",
" <td>Sun4458</td>\n",
" <td>1322</td>\n",
" <td>$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>23.68</td>\n",
" <td>3.31</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>11.84</td>\n",
" <td>Nathaniel Harris</td>\n",
" <td>4676137647685994</td>\n",
" <td>Sun5260</td>\n",
" <td>5994</td>\n",
" <td>$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>24.59</td>\n",
" <td>3.61</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>4</td>\n",
" <td>6.15</td>\n",
" <td>Tonya Carter</td>\n",
" <td>4832732618637221</td>\n",
" <td>Sun2251</td>\n",
" <td>7221</td>\n",
" <td>$$</td>\n",
" <td>Other</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" total_bill tip sex smoker day time size price_per_person \\\n",
"0 16.99 1.01 Female No Sun Dinner 2 8.49 \n",
"1 10.34 1.66 Male No Sun Dinner 3 3.45 \n",
"2 21.01 3.50 Male No Sun Dinner 3 7.00 \n",
"3 23.68 3.31 Male No Sun Dinner 2 11.84 \n",
"4 24.59 3.61 Female No Sun Dinner 4 6.15 \n",
"\n",
" Payer Name CC Number Payment ID last_four Expensive \\\n",
"0 Christy Cunningham 3560325168603410 Sun2959 3410 $$ \n",
"1 Douglas Tucker 4478071379779230 Sun4608 9230 $$ \n",
"2 Travis Walters 6011812112971322 Sun4458 1322 $$ \n",
"3 Nathaniel Harris 4676137647685994 Sun5260 5994 $$ \n",
"4 Tonya Carter 4832732618637221 Sun2251 7221 $$ \n",
"\n",
" Tip Quality \n",
"0 Other \n",
"1 Other \n",
"2 Other \n",
"3 Other \n",
"4 Other "
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 Ok\n",
"1 Ok\n",
"2 Ok\n",
"3 Ok\n",
"4 Ok\n",
"5 Ok\n",
"6 Ok\n",
"7 Ok\n",
"8 Ok\n",
"9 Ok\n",
"10 Ok\n",
"11 Ok\n",
"12 Ok\n",
"13 Ok\n",
"14 Ok\n",
"15 Ok\n",
"16 Ok\n",
"17 Ok\n",
"18 Ok\n",
"19 Ok\n",
"20 Ok\n",
"21 Ok\n",
"22 Ok\n",
"23 Ok\n",
"24 Ok\n",
"25 Ok\n",
"26 Ok\n",
"27 Ok\n",
"28 Ok\n",
"29 Ok\n",
" ... \n",
"214 Ok\n",
"215 Ok\n",
"216 Ok\n",
"217 Ok\n",
"218 Ok\n",
"219 Ok\n",
"220 Ok\n",
"221 Generous\n",
"222 Ok\n",
"223 Ok\n",
"224 Ok\n",
"225 Ok\n",
"226 Ok\n",
"227 Ok\n",
"228 Ok\n",
"229 Ok\n",
"230 Ok\n",
"231 Ok\n",
"232 Generous\n",
"233 Ok\n",
"234 Ok\n",
"235 Ok\n",
"236 Ok\n",
"237 Ok\n",
"238 Ok\n",
"239 Ok\n",
"240 Ok\n",
"241 Ok\n",
"242 Ok\n",
"243 Ok\n",
"Name: Tip Quality, Length: 244, dtype: object"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['Tip Quality'].replace(to_replace='Other',value='Ok')"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"df['Tip Quality'] = df['Tip Quality'].replace(to_replace='Other',value='Ok')"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>total_bill</th>\n",
" <th>tip</th>\n",
" <th>sex</th>\n",
" <th>smoker</th>\n",
" <th>day</th>\n",
" <th>time</th>\n",
" <th>size</th>\n",
" <th>price_per_person</th>\n",
" <th>Payer Name</th>\n",
" <th>CC Number</th>\n",
" <th>Payment ID</th>\n",
" <th>last_four</th>\n",
" <th>Expensive</th>\n",
" <th>Tip Quality</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>16.99</td>\n",
" <td>1.01</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>8.49</td>\n",
" <td>Christy Cunningham</td>\n",
" <td>3560325168603410</td>\n",
" <td>Sun2959</td>\n",
" <td>3410</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>10.34</td>\n",
" <td>1.66</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>3.45</td>\n",
" <td>Douglas Tucker</td>\n",
" <td>4478071379779230</td>\n",
" <td>Sun4608</td>\n",
" <td>9230</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>21.01</td>\n",
" <td>3.50</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>7.00</td>\n",
" <td>Travis Walters</td>\n",
" <td>6011812112971322</td>\n",
" <td>Sun4458</td>\n",
" <td>1322</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>23.68</td>\n",
" <td>3.31</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>11.84</td>\n",
" <td>Nathaniel Harris</td>\n",
" <td>4676137647685994</td>\n",
" <td>Sun5260</td>\n",
" <td>5994</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>24.59</td>\n",
" <td>3.61</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>4</td>\n",
" <td>6.15</td>\n",
" <td>Tonya Carter</td>\n",
" <td>4832732618637221</td>\n",
" <td>Sun2251</td>\n",
" <td>7221</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" total_bill tip sex smoker day time size price_per_person \\\n",
"0 16.99 1.01 Female No Sun Dinner 2 8.49 \n",
"1 10.34 1.66 Male No Sun Dinner 3 3.45 \n",
"2 21.01 3.50 Male No Sun Dinner 3 7.00 \n",
"3 23.68 3.31 Male No Sun Dinner 2 11.84 \n",
"4 24.59 3.61 Female No Sun Dinner 4 6.15 \n",
"\n",
" Payer Name CC Number Payment ID last_four Expensive \\\n",
"0 Christy Cunningham 3560325168603410 Sun2959 3410 $$ \n",
"1 Douglas Tucker 4478071379779230 Sun4608 9230 $$ \n",
"2 Travis Walters 6011812112971322 Sun4458 1322 $$ \n",
"3 Nathaniel Harris 4676137647685994 Sun5260 5994 $$ \n",
"4 Tonya Carter 4832732618637221 Sun2251 7221 $$ \n",
"\n",
" Tip Quality \n",
"0 Ok \n",
"1 Ok \n",
"2 Ok \n",
"3 Ok \n",
"4 Ok "
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='uni'></a>\n",
"### unique"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([2, 3, 4, 1, 6, 5], dtype=int64)"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['size'].unique()"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"6"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['size'].nunique()"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['Dinner', 'Lunch'], dtype=object)"
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['time'].unique()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='map'></a>\n",
"### map"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"my_map = {'Dinner':'D','Lunch':'L'}"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 D\n",
"1 D\n",
"2 D\n",
"3 D\n",
"4 D\n",
"5 D\n",
"6 D\n",
"7 D\n",
"8 D\n",
"9 D\n",
"10 D\n",
"11 D\n",
"12 D\n",
"13 D\n",
"14 D\n",
"15 D\n",
"16 D\n",
"17 D\n",
"18 D\n",
"19 D\n",
"20 D\n",
"21 D\n",
"22 D\n",
"23 D\n",
"24 D\n",
"25 D\n",
"26 D\n",
"27 D\n",
"28 D\n",
"29 D\n",
" ..\n",
"214 D\n",
"215 D\n",
"216 D\n",
"217 D\n",
"218 D\n",
"219 D\n",
"220 L\n",
"221 L\n",
"222 L\n",
"223 L\n",
"224 L\n",
"225 L\n",
"226 L\n",
"227 D\n",
"228 D\n",
"229 D\n",
"230 D\n",
"231 D\n",
"232 D\n",
"233 D\n",
"234 D\n",
"235 D\n",
"236 D\n",
"237 D\n",
"238 D\n",
"239 D\n",
"240 D\n",
"241 D\n",
"242 D\n",
"243 D\n",
"Name: time, Length: 244, dtype: object"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['time'].map(my_map)"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>total_bill</th>\n",
" <th>tip</th>\n",
" <th>sex</th>\n",
" <th>smoker</th>\n",
" <th>day</th>\n",
" <th>time</th>\n",
" <th>size</th>\n",
" <th>price_per_person</th>\n",
" <th>Payer Name</th>\n",
" <th>CC Number</th>\n",
" <th>Payment ID</th>\n",
" <th>last_four</th>\n",
" <th>Expensive</th>\n",
" <th>Tip Quality</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>16.99</td>\n",
" <td>1.01</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>8.49</td>\n",
" <td>Christy Cunningham</td>\n",
" <td>3560325168603410</td>\n",
" <td>Sun2959</td>\n",
" <td>3410</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>10.34</td>\n",
" <td>1.66</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>3.45</td>\n",
" <td>Douglas Tucker</td>\n",
" <td>4478071379779230</td>\n",
" <td>Sun4608</td>\n",
" <td>9230</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>21.01</td>\n",
" <td>3.50</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>7.00</td>\n",
" <td>Travis Walters</td>\n",
" <td>6011812112971322</td>\n",
" <td>Sun4458</td>\n",
" <td>1322</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>23.68</td>\n",
" <td>3.31</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>11.84</td>\n",
" <td>Nathaniel Harris</td>\n",
" <td>4676137647685994</td>\n",
" <td>Sun5260</td>\n",
" <td>5994</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>24.59</td>\n",
" <td>3.61</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>4</td>\n",
" <td>6.15</td>\n",
" <td>Tonya Carter</td>\n",
" <td>4832732618637221</td>\n",
" <td>Sun2251</td>\n",
" <td>7221</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" total_bill tip sex smoker day time size price_per_person \\\n",
"0 16.99 1.01 Female No Sun Dinner 2 8.49 \n",
"1 10.34 1.66 Male No Sun Dinner 3 3.45 \n",
"2 21.01 3.50 Male No Sun Dinner 3 7.00 \n",
"3 23.68 3.31 Male No Sun Dinner 2 11.84 \n",
"4 24.59 3.61 Female No Sun Dinner 4 6.15 \n",
"\n",
" Payer Name CC Number Payment ID last_four Expensive \\\n",
"0 Christy Cunningham 3560325168603410 Sun2959 3410 $$ \n",
"1 Douglas Tucker 4478071379779230 Sun4608 9230 $$ \n",
"2 Travis Walters 6011812112971322 Sun4458 1322 $$ \n",
"3 Nathaniel Harris 4676137647685994 Sun5260 5994 $$ \n",
"4 Tonya Carter 4832732618637221 Sun2251 7221 $$ \n",
"\n",
" Tip Quality \n",
"0 Ok \n",
"1 Ok \n",
"2 Ok \n",
"3 Ok \n",
"4 Ok "
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='dup'></a>\n",
"## Duplicates\n",
"\n",
"### .duplicated() and .drop_duplicates()"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 False\n",
"1 False\n",
"2 False\n",
"3 False\n",
"4 False\n",
"5 False\n",
"6 False\n",
"7 False\n",
"8 False\n",
"9 False\n",
"10 False\n",
"11 False\n",
"12 False\n",
"13 False\n",
"14 False\n",
"15 False\n",
"16 False\n",
"17 False\n",
"18 False\n",
"19 False\n",
"20 False\n",
"21 False\n",
"22 False\n",
"23 False\n",
"24 False\n",
"25 False\n",
"26 False\n",
"27 False\n",
"28 False\n",
"29 False\n",
" ... \n",
"214 False\n",
"215 False\n",
"216 False\n",
"217 False\n",
"218 False\n",
"219 False\n",
"220 False\n",
"221 False\n",
"222 False\n",
"223 False\n",
"224 False\n",
"225 False\n",
"226 False\n",
"227 False\n",
"228 False\n",
"229 False\n",
"230 False\n",
"231 False\n",
"232 False\n",
"233 False\n",
"234 False\n",
"235 False\n",
"236 False\n",
"237 False\n",
"238 False\n",
"239 False\n",
"240 False\n",
"241 False\n",
"242 False\n",
"243 False\n",
"Length: 244, dtype: bool"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Returns True for the 1st instance of a duplicated row\n",
"df.duplicated()"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"simple_df = pd.DataFrame([1,2,2],['a','b','c'])"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>a</th>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>b</th>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>c</th>\n",
" <td>2</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 0\n",
"a 1\n",
"b 2\n",
"c 2"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"simple_df"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"a False\n",
"b False\n",
"c True\n",
"dtype: bool"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"simple_df.duplicated()"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>a</th>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>b</th>\n",
" <td>2</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 0\n",
"a 1\n",
"b 2"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"simple_df.drop_duplicates()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='bet'></a>\n",
"## between\n",
"\n",
"left: A scalar value that defines the left boundary\n",
"right: A scalar value that defines the right boundary\n",
"inclusive: A Boolean value which is True by default. If False, it excludes the two passed arguments while checking."
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 True\n",
"1 True\n",
"2 False\n",
"3 False\n",
"4 False\n",
"5 False\n",
"6 False\n",
"7 False\n",
"8 True\n",
"9 True\n",
"10 True\n",
"11 False\n",
"12 True\n",
"13 True\n",
"14 True\n",
"15 False\n",
"16 True\n",
"17 True\n",
"18 True\n",
"19 False\n",
"20 True\n",
"21 False\n",
"22 True\n",
"23 False\n",
"24 True\n",
"25 True\n",
"26 True\n",
"27 True\n",
"28 False\n",
"29 True\n",
" ... \n",
"214 False\n",
"215 True\n",
"216 False\n",
"217 True\n",
"218 False\n",
"219 False\n",
"220 True\n",
"221 True\n",
"222 False\n",
"223 True\n",
"224 True\n",
"225 True\n",
"226 True\n",
"227 False\n",
"228 True\n",
"229 False\n",
"230 False\n",
"231 True\n",
"232 True\n",
"233 True\n",
"234 True\n",
"235 True\n",
"236 True\n",
"237 False\n",
"238 False\n",
"239 False\n",
"240 False\n",
"241 False\n",
"242 True\n",
"243 True\n",
"Name: total_bill, Length: 244, dtype: bool"
]
},
"execution_count": 64,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['total_bill'].between(10,20,inclusive=True)"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>total_bill</th>\n",
" <th>tip</th>\n",
" <th>sex</th>\n",
" <th>smoker</th>\n",
" <th>day</th>\n",
" <th>time</th>\n",
" <th>size</th>\n",
" <th>price_per_person</th>\n",
" <th>Payer Name</th>\n",
" <th>CC Number</th>\n",
" <th>Payment ID</th>\n",
" <th>last_four</th>\n",
" <th>Expensive</th>\n",
" <th>Tip Quality</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>16.99</td>\n",
" <td>1.01</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>8.49</td>\n",
" <td>Christy Cunningham</td>\n",
" <td>3560325168603410</td>\n",
" <td>Sun2959</td>\n",
" <td>3410</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>10.34</td>\n",
" <td>1.66</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>3.45</td>\n",
" <td>Douglas Tucker</td>\n",
" <td>4478071379779230</td>\n",
" <td>Sun4608</td>\n",
" <td>9230</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>15.04</td>\n",
" <td>1.96</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>7.52</td>\n",
" <td>Joseph Mcdonald</td>\n",
" <td>3522866365840377</td>\n",
" <td>Sun6820</td>\n",
" <td>0377</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>14.78</td>\n",
" <td>3.23</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>7.39</td>\n",
" <td>Jerome Abbott</td>\n",
" <td>3532124519049786</td>\n",
" <td>Sun3775</td>\n",
" <td>9786</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>10.27</td>\n",
" <td>1.71</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>5.14</td>\n",
" <td>William Riley</td>\n",
" <td>566287581219</td>\n",
" <td>Sun2546</td>\n",
" <td>1219</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>15.42</td>\n",
" <td>1.57</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>7.71</td>\n",
" <td>Chad Harrington</td>\n",
" <td>577040572932</td>\n",
" <td>Sun1300</td>\n",
" <td>2932</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>18.43</td>\n",
" <td>3.00</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>4</td>\n",
" <td>4.61</td>\n",
" <td>Joshua Jones</td>\n",
" <td>6011163105616890</td>\n",
" <td>Sun2971</td>\n",
" <td>6890</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>14.83</td>\n",
" <td>3.02</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>7.42</td>\n",
" <td>Vanessa Jones</td>\n",
" <td>30016702287574</td>\n",
" <td>Sun3848</td>\n",
" <td>7574</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>10.33</td>\n",
" <td>1.67</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>3.44</td>\n",
" <td>Elizabeth Foster</td>\n",
" <td>4240025044626033</td>\n",
" <td>Sun9715</td>\n",
" <td>6033</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>16.29</td>\n",
" <td>3.71</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>5.43</td>\n",
" <td>John Pittman</td>\n",
" <td>6521340257218708</td>\n",
" <td>Sun2998</td>\n",
" <td>8708</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>16.97</td>\n",
" <td>3.50</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>5.66</td>\n",
" <td>Laura Martinez</td>\n",
" <td>30422275171379</td>\n",
" <td>Sun2789</td>\n",
" <td>1379</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>17.92</td>\n",
" <td>4.08</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>8.96</td>\n",
" <td>Thomas Rice</td>\n",
" <td>4403296224639756</td>\n",
" <td>Sat1709</td>\n",
" <td>9756</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>15.77</td>\n",
" <td>2.23</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>7.88</td>\n",
" <td>Ashley Shelton</td>\n",
" <td>3524119516293213</td>\n",
" <td>Sat9786</td>\n",
" <td>3213</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>19.82</td>\n",
" <td>3.18</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>9.91</td>\n",
" <td>Christopher Ross</td>\n",
" <td>36739148167928</td>\n",
" <td>Sat6236</td>\n",
" <td>7928</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>17.81</td>\n",
" <td>2.34</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>4</td>\n",
" <td>4.45</td>\n",
" <td>Robert Perkins</td>\n",
" <td>30502930499388</td>\n",
" <td>Sat907</td>\n",
" <td>9388</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>13.37</td>\n",
" <td>2.00</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>6.68</td>\n",
" <td>Kyle Avery</td>\n",
" <td>6531339539615499</td>\n",
" <td>Sat6651</td>\n",
" <td>5499</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>12.69</td>\n",
" <td>2.00</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>6.34</td>\n",
" <td>Patrick Barber</td>\n",
" <td>30155551880343</td>\n",
" <td>Sat394</td>\n",
" <td>0343</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>19.65</td>\n",
" <td>3.00</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>9.82</td>\n",
" <td>Melinda Murphy</td>\n",
" <td>5489272944576051</td>\n",
" <td>Sat2467</td>\n",
" <td>6051</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31</th>\n",
" <td>18.35</td>\n",
" <td>2.50</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>4</td>\n",
" <td>4.59</td>\n",
" <td>Danny Santiago</td>\n",
" <td>630415546013</td>\n",
" <td>Sat4947</td>\n",
" <td>6013</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>32</th>\n",
" <td>15.06</td>\n",
" <td>3.00</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>7.53</td>\n",
" <td>Amanda Wilson</td>\n",
" <td>213186304291560</td>\n",
" <td>Sat1327</td>\n",
" <td>1560</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>34</th>\n",
" <td>17.78</td>\n",
" <td>3.27</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>8.89</td>\n",
" <td>Jacob Castillo</td>\n",
" <td>3551492000704805</td>\n",
" <td>Sat8124</td>\n",
" <td>4805</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>36</th>\n",
" <td>16.31</td>\n",
" <td>2.00</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>5.44</td>\n",
" <td>William Ford</td>\n",
" <td>3527691170179398</td>\n",
" <td>Sat9139</td>\n",
" <td>9398</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>37</th>\n",
" <td>16.93</td>\n",
" <td>3.07</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>5.64</td>\n",
" <td>Erin Lewis</td>\n",
" <td>5161695527390786</td>\n",
" <td>Sat6406</td>\n",
" <td>0786</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>38</th>\n",
" <td>18.69</td>\n",
" <td>2.31</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>6.23</td>\n",
" <td>Brandon Bradley</td>\n",
" <td>4427601595688633</td>\n",
" <td>Sat4056</td>\n",
" <td>8633</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>40</th>\n",
" <td>16.04</td>\n",
" <td>2.24</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>5.35</td>\n",
" <td>Adam Edwards</td>\n",
" <td>3544447755679420</td>\n",
" <td>Sat8549</td>\n",
" <td>9420</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41</th>\n",
" <td>17.46</td>\n",
" <td>2.54</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>8.73</td>\n",
" <td>David Boyer</td>\n",
" <td>3536678244278149</td>\n",
" <td>Sun9460</td>\n",
" <td>8149</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>42</th>\n",
" <td>13.94</td>\n",
" <td>3.06</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>6.97</td>\n",
" <td>Bryan Brown</td>\n",
" <td>36231182760859</td>\n",
" <td>Sun1699</td>\n",
" <td>0859</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>45</th>\n",
" <td>18.29</td>\n",
" <td>3.00</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>9.14</td>\n",
" <td>Richard Fitzgerald</td>\n",
" <td>375156610762053</td>\n",
" <td>Sun8643</td>\n",
" <td>2053</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49</th>\n",
" <td>18.04</td>\n",
" <td>3.00</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>9.02</td>\n",
" <td>William Roth</td>\n",
" <td>6573923967142503</td>\n",
" <td>Sun9774</td>\n",
" <td>2503</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50</th>\n",
" <td>12.54</td>\n",
" <td>2.50</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>6.27</td>\n",
" <td>Jeremiah Neal</td>\n",
" <td>2225400829691416</td>\n",
" <td>Sun2021</td>\n",
" <td>1416</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>191</th>\n",
" <td>19.81</td>\n",
" <td>4.19</td>\n",
" <td>Female</td>\n",
" <td>Yes</td>\n",
" <td>Thur</td>\n",
" <td>Lunch</td>\n",
" <td>2</td>\n",
" <td>9.90</td>\n",
" <td>Kristy Boyd</td>\n",
" <td>4317015327600068</td>\n",
" <td>Thur967</td>\n",
" <td>0068</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>193</th>\n",
" <td>15.48</td>\n",
" <td>2.02</td>\n",
" <td>Male</td>\n",
" <td>Yes</td>\n",
" <td>Thur</td>\n",
" <td>Lunch</td>\n",
" <td>2</td>\n",
" <td>7.74</td>\n",
" <td>Raymond Sullivan</td>\n",
" <td>180068856139315</td>\n",
" <td>Thur606</td>\n",
" <td>9315</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>194</th>\n",
" <td>16.58</td>\n",
" <td>4.00</td>\n",
" <td>Male</td>\n",
" <td>Yes</td>\n",
" <td>Thur</td>\n",
" <td>Lunch</td>\n",
" <td>2</td>\n",
" <td>8.29</td>\n",
" <td>Benjamin Weber</td>\n",
" <td>676210011505</td>\n",
" <td>Thur9318</td>\n",
" <td>1505</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>196</th>\n",
" <td>10.34</td>\n",
" <td>2.00</td>\n",
" <td>Male</td>\n",
" <td>Yes</td>\n",
" <td>Thur</td>\n",
" <td>Lunch</td>\n",
" <td>2</td>\n",
" <td>5.17</td>\n",
" <td>Eric Martin</td>\n",
" <td>30442491190342</td>\n",
" <td>Thur9862</td>\n",
" <td>0342</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>198</th>\n",
" <td>13.00</td>\n",
" <td>2.00</td>\n",
" <td>Female</td>\n",
" <td>Yes</td>\n",
" <td>Thur</td>\n",
" <td>Lunch</td>\n",
" <td>2</td>\n",
" <td>6.50</td>\n",
" <td>Katherine Bond</td>\n",
" <td>4926725945192</td>\n",
" <td>Thur437</td>\n",
" <td>5192</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>199</th>\n",
" <td>13.51</td>\n",
" <td>2.00</td>\n",
" <td>Male</td>\n",
" <td>Yes</td>\n",
" <td>Thur</td>\n",
" <td>Lunch</td>\n",
" <td>2</td>\n",
" <td>6.76</td>\n",
" <td>Joseph Murphy MD</td>\n",
" <td>6547218923471275</td>\n",
" <td>Thur2428</td>\n",
" <td>1275</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>200</th>\n",
" <td>18.71</td>\n",
" <td>4.00</td>\n",
" <td>Male</td>\n",
" <td>Yes</td>\n",
" <td>Thur</td>\n",
" <td>Lunch</td>\n",
" <td>3</td>\n",
" <td>6.24</td>\n",
" <td>Jason Conrad</td>\n",
" <td>4581233003487</td>\n",
" <td>Thur6048</td>\n",
" <td>3487</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>201</th>\n",
" <td>12.74</td>\n",
" <td>2.01</td>\n",
" <td>Female</td>\n",
" <td>Yes</td>\n",
" <td>Thur</td>\n",
" <td>Lunch</td>\n",
" <td>2</td>\n",
" <td>6.37</td>\n",
" <td>Abigail Parks</td>\n",
" <td>3586645396220590</td>\n",
" <td>Thur2544</td>\n",
" <td>0590</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>202</th>\n",
" <td>13.00</td>\n",
" <td>2.00</td>\n",
" <td>Female</td>\n",
" <td>Yes</td>\n",
" <td>Thur</td>\n",
" <td>Lunch</td>\n",
" <td>2</td>\n",
" <td>6.50</td>\n",
" <td>Ashley Shaw</td>\n",
" <td>180088043008041</td>\n",
" <td>Thur1301</td>\n",
" <td>8041</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>203</th>\n",
" <td>16.40</td>\n",
" <td>2.50</td>\n",
" <td>Female</td>\n",
" <td>Yes</td>\n",
" <td>Thur</td>\n",
" <td>Lunch</td>\n",
" <td>2</td>\n",
" <td>8.20</td>\n",
" <td>Toni Brooks</td>\n",
" <td>3582289985920239</td>\n",
" <td>Thur7770</td>\n",
" <td>0239</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>205</th>\n",
" <td>16.47</td>\n",
" <td>3.23</td>\n",
" <td>Female</td>\n",
" <td>Yes</td>\n",
" <td>Thur</td>\n",
" <td>Lunch</td>\n",
" <td>3</td>\n",
" <td>5.49</td>\n",
" <td>Carly Reyes</td>\n",
" <td>4787787236486</td>\n",
" <td>Thur8084</td>\n",
" <td>6486</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>209</th>\n",
" <td>12.76</td>\n",
" <td>2.23</td>\n",
" <td>Female</td>\n",
" <td>Yes</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>6.38</td>\n",
" <td>Sarah Cunningham</td>\n",
" <td>341876516331163</td>\n",
" <td>Sat1274</td>\n",
" <td>1163</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>213</th>\n",
" <td>13.27</td>\n",
" <td>2.50</td>\n",
" <td>Female</td>\n",
" <td>Yes</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>6.64</td>\n",
" <td>Robin Andersen</td>\n",
" <td>580140531089</td>\n",
" <td>Sat1374</td>\n",
" <td>1089</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>215</th>\n",
" <td>12.90</td>\n",
" <td>1.10</td>\n",
" <td>Female</td>\n",
" <td>Yes</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>6.45</td>\n",
" <td>Jessica Owen</td>\n",
" <td>4726904879471</td>\n",
" <td>Sat6983</td>\n",
" <td>9471</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>217</th>\n",
" <td>11.59</td>\n",
" <td>1.50</td>\n",
" <td>Male</td>\n",
" <td>Yes</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>5.80</td>\n",
" <td>Gary Orr</td>\n",
" <td>30324521283406</td>\n",
" <td>Sat8489</td>\n",
" <td>3406</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>220</th>\n",
" <td>12.16</td>\n",
" <td>2.20</td>\n",
" <td>Male</td>\n",
" <td>Yes</td>\n",
" <td>Fri</td>\n",
" <td>Lunch</td>\n",
" <td>2</td>\n",
" <td>6.08</td>\n",
" <td>Ricky Johnson</td>\n",
" <td>213109508670736</td>\n",
" <td>Fri4607</td>\n",
" <td>0736</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>221</th>\n",
" <td>13.42</td>\n",
" <td>3.48</td>\n",
" <td>Female</td>\n",
" <td>Yes</td>\n",
" <td>Fri</td>\n",
" <td>Lunch</td>\n",
" <td>2</td>\n",
" <td>6.71</td>\n",
" <td>Leslie Kaufman</td>\n",
" <td>379437981958785</td>\n",
" <td>Fri7511</td>\n",
" <td>8785</td>\n",
" <td>$$</td>\n",
" <td>Generous</td>\n",
" </tr>\n",
" <tr>\n",
" <th>223</th>\n",
" <td>15.98</td>\n",
" <td>3.00</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Fri</td>\n",
" <td>Lunch</td>\n",
" <td>3</td>\n",
" <td>5.33</td>\n",
" <td>Mary Rivera</td>\n",
" <td>5343428579353069</td>\n",
" <td>Fri6014</td>\n",
" <td>3069</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>224</th>\n",
" <td>13.42</td>\n",
" <td>1.58</td>\n",
" <td>Male</td>\n",
" <td>Yes</td>\n",
" <td>Fri</td>\n",
" <td>Lunch</td>\n",
" <td>2</td>\n",
" <td>6.71</td>\n",
" <td>Ronald Vaughn DVM</td>\n",
" <td>341503466406403</td>\n",
" <td>Fri5959</td>\n",
" <td>6403</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>225</th>\n",
" <td>16.27</td>\n",
" <td>2.50</td>\n",
" <td>Female</td>\n",
" <td>Yes</td>\n",
" <td>Fri</td>\n",
" <td>Lunch</td>\n",
" <td>2</td>\n",
" <td>8.14</td>\n",
" <td>Whitney Arnold</td>\n",
" <td>3579111947217428</td>\n",
" <td>Fri6665</td>\n",
" <td>7428</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>226</th>\n",
" <td>10.09</td>\n",
" <td>2.00</td>\n",
" <td>Female</td>\n",
" <td>Yes</td>\n",
" <td>Fri</td>\n",
" <td>Lunch</td>\n",
" <td>2</td>\n",
" <td>5.04</td>\n",
" <td>Ruth Weiss</td>\n",
" <td>5268689490381635</td>\n",
" <td>Fri6359</td>\n",
" <td>1635</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>228</th>\n",
" <td>13.28</td>\n",
" <td>2.72</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>6.64</td>\n",
" <td>Glenn Jones</td>\n",
" <td>502061651712</td>\n",
" <td>Sat2937</td>\n",
" <td>1712</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>231</th>\n",
" <td>15.69</td>\n",
" <td>3.00</td>\n",
" <td>Male</td>\n",
" <td>Yes</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>5.23</td>\n",
" <td>Jason Parks</td>\n",
" <td>4812333796161</td>\n",
" <td>Sat6334</td>\n",
" <td>6161</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>232</th>\n",
" <td>11.61</td>\n",
" <td>3.39</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>5.80</td>\n",
" <td>James Taylor</td>\n",
" <td>6011482917327995</td>\n",
" <td>Sat2124</td>\n",
" <td>7995</td>\n",
" <td>$$</td>\n",
" <td>Generous</td>\n",
" </tr>\n",
" <tr>\n",
" <th>233</th>\n",
" <td>10.77</td>\n",
" <td>1.47</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>5.38</td>\n",
" <td>Paul Novak</td>\n",
" <td>6011698897610858</td>\n",
" <td>Sat1467</td>\n",
" <td>0858</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>234</th>\n",
" <td>15.53</td>\n",
" <td>3.00</td>\n",
" <td>Male</td>\n",
" <td>Yes</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>7.76</td>\n",
" <td>Tracy Douglas</td>\n",
" <td>4097938155941930</td>\n",
" <td>Sat7220</td>\n",
" <td>1930</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>235</th>\n",
" <td>10.07</td>\n",
" <td>1.25</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>5.04</td>\n",
" <td>Sean Gonzalez</td>\n",
" <td>3534021246117605</td>\n",
" <td>Sat4615</td>\n",
" <td>7605</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>236</th>\n",
" <td>12.60</td>\n",
" <td>1.00</td>\n",
" <td>Male</td>\n",
" <td>Yes</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>6.30</td>\n",
" <td>Matthew Myers</td>\n",
" <td>3543676378973965</td>\n",
" <td>Sat5032</td>\n",
" <td>3965</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>242</th>\n",
" <td>17.82</td>\n",
" <td>1.75</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>8.91</td>\n",
" <td>Dennis Dixon</td>\n",
" <td>4375220550950</td>\n",
" <td>Sat17</td>\n",
" <td>0950</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>243</th>\n",
" <td>18.78</td>\n",
" <td>3.00</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Thur</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>9.39</td>\n",
" <td>Michelle Hardin</td>\n",
" <td>3511451626698139</td>\n",
" <td>Thur672</td>\n",
" <td>8139</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>130 rows × 14 columns</p>\n",
"</div>"
],
"text/plain": [
" total_bill tip sex smoker day time size price_per_person \\\n",
"0 16.99 1.01 Female No Sun Dinner 2 8.49 \n",
"1 10.34 1.66 Male No Sun Dinner 3 3.45 \n",
"8 15.04 1.96 Male No Sun Dinner 2 7.52 \n",
"9 14.78 3.23 Male No Sun Dinner 2 7.39 \n",
"10 10.27 1.71 Male No Sun Dinner 2 5.14 \n",
"12 15.42 1.57 Male No Sun Dinner 2 7.71 \n",
"13 18.43 3.00 Male No Sun Dinner 4 4.61 \n",
"14 14.83 3.02 Female No Sun Dinner 2 7.42 \n",
"16 10.33 1.67 Female No Sun Dinner 3 3.44 \n",
"17 16.29 3.71 Male No Sun Dinner 3 5.43 \n",
"18 16.97 3.50 Female No Sun Dinner 3 5.66 \n",
"20 17.92 4.08 Male No Sat Dinner 2 8.96 \n",
"22 15.77 2.23 Female No Sat Dinner 2 7.88 \n",
"24 19.82 3.18 Male No Sat Dinner 2 9.91 \n",
"25 17.81 2.34 Male No Sat Dinner 4 4.45 \n",
"26 13.37 2.00 Male No Sat Dinner 2 6.68 \n",
"27 12.69 2.00 Male No Sat Dinner 2 6.34 \n",
"29 19.65 3.00 Female No Sat Dinner 2 9.82 \n",
"31 18.35 2.50 Male No Sat Dinner 4 4.59 \n",
"32 15.06 3.00 Female No Sat Dinner 2 7.53 \n",
"34 17.78 3.27 Male No Sat Dinner 2 8.89 \n",
"36 16.31 2.00 Male No Sat Dinner 3 5.44 \n",
"37 16.93 3.07 Female No Sat Dinner 3 5.64 \n",
"38 18.69 2.31 Male No Sat Dinner 3 6.23 \n",
"40 16.04 2.24 Male No Sat Dinner 3 5.35 \n",
"41 17.46 2.54 Male No Sun Dinner 2 8.73 \n",
"42 13.94 3.06 Male No Sun Dinner 2 6.97 \n",
"45 18.29 3.00 Male No Sun Dinner 2 9.14 \n",
"49 18.04 3.00 Male No Sun Dinner 2 9.02 \n",
"50 12.54 2.50 Male No Sun Dinner 2 6.27 \n",
".. ... ... ... ... ... ... ... ... \n",
"191 19.81 4.19 Female Yes Thur Lunch 2 9.90 \n",
"193 15.48 2.02 Male Yes Thur Lunch 2 7.74 \n",
"194 16.58 4.00 Male Yes Thur Lunch 2 8.29 \n",
"196 10.34 2.00 Male Yes Thur Lunch 2 5.17 \n",
"198 13.00 2.00 Female Yes Thur Lunch 2 6.50 \n",
"199 13.51 2.00 Male Yes Thur Lunch 2 6.76 \n",
"200 18.71 4.00 Male Yes Thur Lunch 3 6.24 \n",
"201 12.74 2.01 Female Yes Thur Lunch 2 6.37 \n",
"202 13.00 2.00 Female Yes Thur Lunch 2 6.50 \n",
"203 16.40 2.50 Female Yes Thur Lunch 2 8.20 \n",
"205 16.47 3.23 Female Yes Thur Lunch 3 5.49 \n",
"209 12.76 2.23 Female Yes Sat Dinner 2 6.38 \n",
"213 13.27 2.50 Female Yes Sat Dinner 2 6.64 \n",
"215 12.90 1.10 Female Yes Sat Dinner 2 6.45 \n",
"217 11.59 1.50 Male Yes Sat Dinner 2 5.80 \n",
"220 12.16 2.20 Male Yes Fri Lunch 2 6.08 \n",
"221 13.42 3.48 Female Yes Fri Lunch 2 6.71 \n",
"223 15.98 3.00 Female No Fri Lunch 3 5.33 \n",
"224 13.42 1.58 Male Yes Fri Lunch 2 6.71 \n",
"225 16.27 2.50 Female Yes Fri Lunch 2 8.14 \n",
"226 10.09 2.00 Female Yes Fri Lunch 2 5.04 \n",
"228 13.28 2.72 Male No Sat Dinner 2 6.64 \n",
"231 15.69 3.00 Male Yes Sat Dinner 3 5.23 \n",
"232 11.61 3.39 Male No Sat Dinner 2 5.80 \n",
"233 10.77 1.47 Male No Sat Dinner 2 5.38 \n",
"234 15.53 3.00 Male Yes Sat Dinner 2 7.76 \n",
"235 10.07 1.25 Male No Sat Dinner 2 5.04 \n",
"236 12.60 1.00 Male Yes Sat Dinner 2 6.30 \n",
"242 17.82 1.75 Male No Sat Dinner 2 8.91 \n",
"243 18.78 3.00 Female No Thur Dinner 2 9.39 \n",
"\n",
" Payer Name CC Number Payment ID last_four Expensive \\\n",
"0 Christy Cunningham 3560325168603410 Sun2959 3410 $$ \n",
"1 Douglas Tucker 4478071379779230 Sun4608 9230 $$ \n",
"8 Joseph Mcdonald 3522866365840377 Sun6820 0377 $$ \n",
"9 Jerome Abbott 3532124519049786 Sun3775 9786 $$ \n",
"10 William Riley 566287581219 Sun2546 1219 $$ \n",
"12 Chad Harrington 577040572932 Sun1300 2932 $$ \n",
"13 Joshua Jones 6011163105616890 Sun2971 6890 $$ \n",
"14 Vanessa Jones 30016702287574 Sun3848 7574 $$ \n",
"16 Elizabeth Foster 4240025044626033 Sun9715 6033 $$ \n",
"17 John Pittman 6521340257218708 Sun2998 8708 $$ \n",
"18 Laura Martinez 30422275171379 Sun2789 1379 $$ \n",
"20 Thomas Rice 4403296224639756 Sat1709 9756 $$ \n",
"22 Ashley Shelton 3524119516293213 Sat9786 3213 $$ \n",
"24 Christopher Ross 36739148167928 Sat6236 7928 $$ \n",
"25 Robert Perkins 30502930499388 Sat907 9388 $$ \n",
"26 Kyle Avery 6531339539615499 Sat6651 5499 $$ \n",
"27 Patrick Barber 30155551880343 Sat394 0343 $$ \n",
"29 Melinda Murphy 5489272944576051 Sat2467 6051 $$ \n",
"31 Danny Santiago 630415546013 Sat4947 6013 $$ \n",
"32 Amanda Wilson 213186304291560 Sat1327 1560 $$ \n",
"34 Jacob Castillo 3551492000704805 Sat8124 4805 $$ \n",
"36 William Ford 3527691170179398 Sat9139 9398 $$ \n",
"37 Erin Lewis 5161695527390786 Sat6406 0786 $$ \n",
"38 Brandon Bradley 4427601595688633 Sat4056 8633 $$ \n",
"40 Adam Edwards 3544447755679420 Sat8549 9420 $$ \n",
"41 David Boyer 3536678244278149 Sun9460 8149 $$ \n",
"42 Bryan Brown 36231182760859 Sun1699 0859 $$ \n",
"45 Richard Fitzgerald 375156610762053 Sun8643 2053 $$ \n",
"49 William Roth 6573923967142503 Sun9774 2503 $$ \n",
"50 Jeremiah Neal 2225400829691416 Sun2021 1416 $$ \n",
".. ... ... ... ... ... \n",
"191 Kristy Boyd 4317015327600068 Thur967 0068 $$ \n",
"193 Raymond Sullivan 180068856139315 Thur606 9315 $$ \n",
"194 Benjamin Weber 676210011505 Thur9318 1505 $$ \n",
"196 Eric Martin 30442491190342 Thur9862 0342 $$ \n",
"198 Katherine Bond 4926725945192 Thur437 5192 $$ \n",
"199 Joseph Murphy MD 6547218923471275 Thur2428 1275 $$ \n",
"200 Jason Conrad 4581233003487 Thur6048 3487 $$ \n",
"201 Abigail Parks 3586645396220590 Thur2544 0590 $$ \n",
"202 Ashley Shaw 180088043008041 Thur1301 8041 $$ \n",
"203 Toni Brooks 3582289985920239 Thur7770 0239 $$ \n",
"205 Carly Reyes 4787787236486 Thur8084 6486 $$ \n",
"209 Sarah Cunningham 341876516331163 Sat1274 1163 $$ \n",
"213 Robin Andersen 580140531089 Sat1374 1089 $$ \n",
"215 Jessica Owen 4726904879471 Sat6983 9471 $$ \n",
"217 Gary Orr 30324521283406 Sat8489 3406 $$ \n",
"220 Ricky Johnson 213109508670736 Fri4607 0736 $$ \n",
"221 Leslie Kaufman 379437981958785 Fri7511 8785 $$ \n",
"223 Mary Rivera 5343428579353069 Fri6014 3069 $$ \n",
"224 Ronald Vaughn DVM 341503466406403 Fri5959 6403 $$ \n",
"225 Whitney Arnold 3579111947217428 Fri6665 7428 $$ \n",
"226 Ruth Weiss 5268689490381635 Fri6359 1635 $$ \n",
"228 Glenn Jones 502061651712 Sat2937 1712 $$ \n",
"231 Jason Parks 4812333796161 Sat6334 6161 $$ \n",
"232 James Taylor 6011482917327995 Sat2124 7995 $$ \n",
"233 Paul Novak 6011698897610858 Sat1467 0858 $$ \n",
"234 Tracy Douglas 4097938155941930 Sat7220 1930 $$ \n",
"235 Sean Gonzalez 3534021246117605 Sat4615 7605 $$ \n",
"236 Matthew Myers 3543676378973965 Sat5032 3965 $$ \n",
"242 Dennis Dixon 4375220550950 Sat17 0950 $$ \n",
"243 Michelle Hardin 3511451626698139 Thur672 8139 $$ \n",
"\n",
" Tip Quality \n",
"0 Ok \n",
"1 Ok \n",
"8 Ok \n",
"9 Ok \n",
"10 Ok \n",
"12 Ok \n",
"13 Ok \n",
"14 Ok \n",
"16 Ok \n",
"17 Ok \n",
"18 Ok \n",
"20 Ok \n",
"22 Ok \n",
"24 Ok \n",
"25 Ok \n",
"26 Ok \n",
"27 Ok \n",
"29 Ok \n",
"31 Ok \n",
"32 Ok \n",
"34 Ok \n",
"36 Ok \n",
"37 Ok \n",
"38 Ok \n",
"40 Ok \n",
"41 Ok \n",
"42 Ok \n",
"45 Ok \n",
"49 Ok \n",
"50 Ok \n",
".. ... \n",
"191 Ok \n",
"193 Ok \n",
"194 Ok \n",
"196 Ok \n",
"198 Ok \n",
"199 Ok \n",
"200 Ok \n",
"201 Ok \n",
"202 Ok \n",
"203 Ok \n",
"205 Ok \n",
"209 Ok \n",
"213 Ok \n",
"215 Ok \n",
"217 Ok \n",
"220 Ok \n",
"221 Generous \n",
"223 Ok \n",
"224 Ok \n",
"225 Ok \n",
"226 Ok \n",
"228 Ok \n",
"231 Ok \n",
"232 Generous \n",
"233 Ok \n",
"234 Ok \n",
"235 Ok \n",
"236 Ok \n",
"242 Ok \n",
"243 Ok \n",
"\n",
"[130 rows x 14 columns]"
]
},
"execution_count": 65,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[df['total_bill'].between(10,20,inclusive=True)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='sample'></a>\n",
"## sample"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>total_bill</th>\n",
" <th>tip</th>\n",
" <th>sex</th>\n",
" <th>smoker</th>\n",
" <th>day</th>\n",
" <th>time</th>\n",
" <th>size</th>\n",
" <th>price_per_person</th>\n",
" <th>Payer Name</th>\n",
" <th>CC Number</th>\n",
" <th>Payment ID</th>\n",
" <th>last_four</th>\n",
" <th>Expensive</th>\n",
" <th>Tip Quality</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>216</th>\n",
" <td>28.15</td>\n",
" <td>3.00</td>\n",
" <td>Male</td>\n",
" <td>Yes</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>5</td>\n",
" <td>5.63</td>\n",
" <td>Shawn Barnett PhD</td>\n",
" <td>4590982568244</td>\n",
" <td>Sat7320</td>\n",
" <td>8244</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>136</th>\n",
" <td>10.33</td>\n",
" <td>2.00</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Thur</td>\n",
" <td>Lunch</td>\n",
" <td>2</td>\n",
" <td>5.16</td>\n",
" <td>Donna Kelly</td>\n",
" <td>180048553626376</td>\n",
" <td>Thur1393</td>\n",
" <td>6376</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>18.43</td>\n",
" <td>3.00</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>4</td>\n",
" <td>4.61</td>\n",
" <td>Joshua Jones</td>\n",
" <td>6011163105616890</td>\n",
" <td>Sun2971</td>\n",
" <td>6890</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>146</th>\n",
" <td>18.64</td>\n",
" <td>1.36</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Thur</td>\n",
" <td>Lunch</td>\n",
" <td>3</td>\n",
" <td>6.21</td>\n",
" <td>Kelly Estrada</td>\n",
" <td>60463302327</td>\n",
" <td>Thur3941</td>\n",
" <td>2327</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>56</th>\n",
" <td>38.01</td>\n",
" <td>3.00</td>\n",
" <td>Male</td>\n",
" <td>Yes</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>4</td>\n",
" <td>9.50</td>\n",
" <td>James Christensen DDS</td>\n",
" <td>349793629453226</td>\n",
" <td>Sat8903</td>\n",
" <td>3226</td>\n",
" <td>$$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" total_bill tip sex smoker day time size price_per_person \\\n",
"216 28.15 3.00 Male Yes Sat Dinner 5 5.63 \n",
"136 10.33 2.00 Female No Thur Lunch 2 5.16 \n",
"13 18.43 3.00 Male No Sun Dinner 4 4.61 \n",
"146 18.64 1.36 Female No Thur Lunch 3 6.21 \n",
"56 38.01 3.00 Male Yes Sat Dinner 4 9.50 \n",
"\n",
" Payer Name CC Number Payment ID last_four Expensive \\\n",
"216 Shawn Barnett PhD 4590982568244 Sat7320 8244 $$ \n",
"136 Donna Kelly 180048553626376 Thur1393 6376 $$ \n",
"13 Joshua Jones 6011163105616890 Sun2971 6890 $$ \n",
"146 Kelly Estrada 60463302327 Thur3941 2327 $$ \n",
"56 James Christensen DDS 349793629453226 Sat8903 3226 $$$ \n",
"\n",
" Tip Quality \n",
"216 Ok \n",
"136 Ok \n",
"13 Ok \n",
"146 Ok \n",
"56 Ok "
]
},
"execution_count": 68,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.sample(5)"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>total_bill</th>\n",
" <th>tip</th>\n",
" <th>sex</th>\n",
" <th>smoker</th>\n",
" <th>day</th>\n",
" <th>time</th>\n",
" <th>size</th>\n",
" <th>price_per_person</th>\n",
" <th>Payer Name</th>\n",
" <th>CC Number</th>\n",
" <th>Payment ID</th>\n",
" <th>last_four</th>\n",
" <th>Expensive</th>\n",
" <th>Tip Quality</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>73</th>\n",
" <td>25.28</td>\n",
" <td>5.00</td>\n",
" <td>Female</td>\n",
" <td>Yes</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>12.64</td>\n",
" <td>Julie Holmes</td>\n",
" <td>5418689346409571</td>\n",
" <td>Sat6065</td>\n",
" <td>9571</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>141</th>\n",
" <td>34.30</td>\n",
" <td>6.70</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Thur</td>\n",
" <td>Lunch</td>\n",
" <td>6</td>\n",
" <td>5.72</td>\n",
" <td>Steven Carlson</td>\n",
" <td>3526515703718508</td>\n",
" <td>Thur1025</td>\n",
" <td>8508</td>\n",
" <td>$$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>239</th>\n",
" <td>29.03</td>\n",
" <td>5.92</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>9.68</td>\n",
" <td>Michael Avila</td>\n",
" <td>5296068606052842</td>\n",
" <td>Sat2657</td>\n",
" <td>2842</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>237</th>\n",
" <td>32.83</td>\n",
" <td>1.17</td>\n",
" <td>Male</td>\n",
" <td>Yes</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>16.42</td>\n",
" <td>Thomas Brown</td>\n",
" <td>4284722681265508</td>\n",
" <td>Sat2929</td>\n",
" <td>5508</td>\n",
" <td>$$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>69</th>\n",
" <td>15.01</td>\n",
" <td>2.09</td>\n",
" <td>Male</td>\n",
" <td>Yes</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>7.50</td>\n",
" <td>Adam Hall</td>\n",
" <td>4700924377057571</td>\n",
" <td>Sat855</td>\n",
" <td>7571</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>108</th>\n",
" <td>18.24</td>\n",
" <td>3.76</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>9.12</td>\n",
" <td>Steven Grant</td>\n",
" <td>4112810433473856</td>\n",
" <td>Sat6376</td>\n",
" <td>3856</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>85</th>\n",
" <td>34.83</td>\n",
" <td>5.17</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Thur</td>\n",
" <td>Lunch</td>\n",
" <td>4</td>\n",
" <td>8.71</td>\n",
" <td>Shawna Cook</td>\n",
" <td>6011787464177340</td>\n",
" <td>Thur7972</td>\n",
" <td>7340</td>\n",
" <td>$$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>156</th>\n",
" <td>48.17</td>\n",
" <td>5.00</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>6</td>\n",
" <td>8.03</td>\n",
" <td>Ryan Gonzales</td>\n",
" <td>3523151482063321</td>\n",
" <td>Sun7518</td>\n",
" <td>3321</td>\n",
" <td>$$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>196</th>\n",
" <td>10.34</td>\n",
" <td>2.00</td>\n",
" <td>Male</td>\n",
" <td>Yes</td>\n",
" <td>Thur</td>\n",
" <td>Lunch</td>\n",
" <td>2</td>\n",
" <td>5.17</td>\n",
" <td>Eric Martin</td>\n",
" <td>30442491190342</td>\n",
" <td>Thur9862</td>\n",
" <td>0342</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41</th>\n",
" <td>17.46</td>\n",
" <td>2.54</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>8.73</td>\n",
" <td>David Boyer</td>\n",
" <td>3536678244278149</td>\n",
" <td>Sun9460</td>\n",
" <td>8149</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>236</th>\n",
" <td>12.60</td>\n",
" <td>1.00</td>\n",
" <td>Male</td>\n",
" <td>Yes</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>6.30</td>\n",
" <td>Matthew Myers</td>\n",
" <td>3543676378973965</td>\n",
" <td>Sat5032</td>\n",
" <td>3965</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>225</th>\n",
" <td>16.27</td>\n",
" <td>2.50</td>\n",
" <td>Female</td>\n",
" <td>Yes</td>\n",
" <td>Fri</td>\n",
" <td>Lunch</td>\n",
" <td>2</td>\n",
" <td>8.14</td>\n",
" <td>Whitney Arnold</td>\n",
" <td>3579111947217428</td>\n",
" <td>Fri6665</td>\n",
" <td>7428</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>61</th>\n",
" <td>13.81</td>\n",
" <td>2.00</td>\n",
" <td>Male</td>\n",
" <td>Yes</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>6.90</td>\n",
" <td>Ryan Hernandez</td>\n",
" <td>4766834726806</td>\n",
" <td>Sat3030</td>\n",
" <td>6806</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>203</th>\n",
" <td>16.40</td>\n",
" <td>2.50</td>\n",
" <td>Female</td>\n",
" <td>Yes</td>\n",
" <td>Thur</td>\n",
" <td>Lunch</td>\n",
" <td>2</td>\n",
" <td>8.20</td>\n",
" <td>Toni Brooks</td>\n",
" <td>3582289985920239</td>\n",
" <td>Thur7770</td>\n",
" <td>0239</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>25.29</td>\n",
" <td>4.71</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>4</td>\n",
" <td>6.32</td>\n",
" <td>Erik Smith</td>\n",
" <td>213140353657882</td>\n",
" <td>Sun9679</td>\n",
" <td>7882</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>220</th>\n",
" <td>12.16</td>\n",
" <td>2.20</td>\n",
" <td>Male</td>\n",
" <td>Yes</td>\n",
" <td>Fri</td>\n",
" <td>Lunch</td>\n",
" <td>2</td>\n",
" <td>6.08</td>\n",
" <td>Ricky Johnson</td>\n",
" <td>213109508670736</td>\n",
" <td>Fri4607</td>\n",
" <td>0736</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>119</th>\n",
" <td>24.08</td>\n",
" <td>2.92</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Thur</td>\n",
" <td>Lunch</td>\n",
" <td>4</td>\n",
" <td>6.02</td>\n",
" <td>Melanie Jordan</td>\n",
" <td>676212062720</td>\n",
" <td>Thur8063</td>\n",
" <td>2720</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>96</th>\n",
" <td>27.28</td>\n",
" <td>4.00</td>\n",
" <td>Male</td>\n",
" <td>Yes</td>\n",
" <td>Fri</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>13.64</td>\n",
" <td>Eric Carter</td>\n",
" <td>4563054452787961</td>\n",
" <td>Fri3159</td>\n",
" <td>7961</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>159</th>\n",
" <td>16.49</td>\n",
" <td>2.00</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>4</td>\n",
" <td>4.12</td>\n",
" <td>Christopher Soto</td>\n",
" <td>30501814271434</td>\n",
" <td>Sun1781</td>\n",
" <td>1434</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>13.37</td>\n",
" <td>2.00</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>6.68</td>\n",
" <td>Kyle Avery</td>\n",
" <td>6531339539615499</td>\n",
" <td>Sat6651</td>\n",
" <td>5499</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>129</th>\n",
" <td>22.82</td>\n",
" <td>2.18</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Thur</td>\n",
" <td>Lunch</td>\n",
" <td>3</td>\n",
" <td>7.61</td>\n",
" <td>Raymond Torres</td>\n",
" <td>4855776744024</td>\n",
" <td>Thur9424</td>\n",
" <td>4024</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>20.29</td>\n",
" <td>2.75</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>10.14</td>\n",
" <td>Natalie Gardner</td>\n",
" <td>5448125351489749</td>\n",
" <td>Sat9618</td>\n",
" <td>9749</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>94</th>\n",
" <td>22.75</td>\n",
" <td>3.25</td>\n",
" <td>Female</td>\n",
" <td>No</td>\n",
" <td>Fri</td>\n",
" <td>Dinner</td>\n",
" <td>2</td>\n",
" <td>11.38</td>\n",
" <td>Jamie Garza</td>\n",
" <td>676318332068</td>\n",
" <td>Fri2318</td>\n",
" <td>2068</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>39</th>\n",
" <td>31.27</td>\n",
" <td>5.00</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>10.42</td>\n",
" <td>Mr. Brandon Berry</td>\n",
" <td>6011525851069856</td>\n",
" <td>Sat6373</td>\n",
" <td>9856</td>\n",
" <td>$$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" total_bill tip sex smoker day time size price_per_person \\\n",
"73 25.28 5.00 Female Yes Sat Dinner 2 12.64 \n",
"141 34.30 6.70 Male No Thur Lunch 6 5.72 \n",
"239 29.03 5.92 Male No Sat Dinner 3 9.68 \n",
"237 32.83 1.17 Male Yes Sat Dinner 2 16.42 \n",
"69 15.01 2.09 Male Yes Sat Dinner 2 7.50 \n",
"108 18.24 3.76 Male No Sat Dinner 2 9.12 \n",
"85 34.83 5.17 Female No Thur Lunch 4 8.71 \n",
"156 48.17 5.00 Male No Sun Dinner 6 8.03 \n",
"196 10.34 2.00 Male Yes Thur Lunch 2 5.17 \n",
"41 17.46 2.54 Male No Sun Dinner 2 8.73 \n",
"236 12.60 1.00 Male Yes Sat Dinner 2 6.30 \n",
"225 16.27 2.50 Female Yes Fri Lunch 2 8.14 \n",
"61 13.81 2.00 Male Yes Sat Dinner 2 6.90 \n",
"203 16.40 2.50 Female Yes Thur Lunch 2 8.20 \n",
"5 25.29 4.71 Male No Sun Dinner 4 6.32 \n",
"220 12.16 2.20 Male Yes Fri Lunch 2 6.08 \n",
"119 24.08 2.92 Female No Thur Lunch 4 6.02 \n",
"96 27.28 4.00 Male Yes Fri Dinner 2 13.64 \n",
"159 16.49 2.00 Male No Sun Dinner 4 4.12 \n",
"26 13.37 2.00 Male No Sat Dinner 2 6.68 \n",
"129 22.82 2.18 Male No Thur Lunch 3 7.61 \n",
"21 20.29 2.75 Female No Sat Dinner 2 10.14 \n",
"94 22.75 3.25 Female No Fri Dinner 2 11.38 \n",
"39 31.27 5.00 Male No Sat Dinner 3 10.42 \n",
"\n",
" Payer Name CC Number Payment ID last_four Expensive \\\n",
"73 Julie Holmes 5418689346409571 Sat6065 9571 $$ \n",
"141 Steven Carlson 3526515703718508 Thur1025 8508 $$$ \n",
"239 Michael Avila 5296068606052842 Sat2657 2842 $$ \n",
"237 Thomas Brown 4284722681265508 Sat2929 5508 $$$ \n",
"69 Adam Hall 4700924377057571 Sat855 7571 $$ \n",
"108 Steven Grant 4112810433473856 Sat6376 3856 $$ \n",
"85 Shawna Cook 6011787464177340 Thur7972 7340 $$$ \n",
"156 Ryan Gonzales 3523151482063321 Sun7518 3321 $$$ \n",
"196 Eric Martin 30442491190342 Thur9862 0342 $$ \n",
"41 David Boyer 3536678244278149 Sun9460 8149 $$ \n",
"236 Matthew Myers 3543676378973965 Sat5032 3965 $$ \n",
"225 Whitney Arnold 3579111947217428 Fri6665 7428 $$ \n",
"61 Ryan Hernandez 4766834726806 Sat3030 6806 $$ \n",
"203 Toni Brooks 3582289985920239 Thur7770 0239 $$ \n",
"5 Erik Smith 213140353657882 Sun9679 7882 $$ \n",
"220 Ricky Johnson 213109508670736 Fri4607 0736 $$ \n",
"119 Melanie Jordan 676212062720 Thur8063 2720 $$ \n",
"96 Eric Carter 4563054452787961 Fri3159 7961 $$ \n",
"159 Christopher Soto 30501814271434 Sun1781 1434 $$ \n",
"26 Kyle Avery 6531339539615499 Sat6651 5499 $$ \n",
"129 Raymond Torres 4855776744024 Thur9424 4024 $$ \n",
"21 Natalie Gardner 5448125351489749 Sat9618 9749 $$ \n",
"94 Jamie Garza 676318332068 Fri2318 2068 $$ \n",
"39 Mr. Brandon Berry 6011525851069856 Sat6373 9856 $$$ \n",
"\n",
" Tip Quality \n",
"73 Ok \n",
"141 Ok \n",
"239 Ok \n",
"237 Ok \n",
"69 Ok \n",
"108 Ok \n",
"85 Ok \n",
"156 Ok \n",
"196 Ok \n",
"41 Ok \n",
"236 Ok \n",
"225 Ok \n",
"61 Ok \n",
"203 Ok \n",
"5 Ok \n",
"220 Ok \n",
"119 Ok \n",
"96 Ok \n",
"159 Ok \n",
"26 Ok \n",
"129 Ok \n",
"21 Ok \n",
"94 Ok \n",
"39 Ok "
]
},
"execution_count": 69,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.sample(frac=0.1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='n'></a>\n",
"## nlargest and nsmallest"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>total_bill</th>\n",
" <th>tip</th>\n",
" <th>sex</th>\n",
" <th>smoker</th>\n",
" <th>day</th>\n",
" <th>time</th>\n",
" <th>size</th>\n",
" <th>price_per_person</th>\n",
" <th>Payer Name</th>\n",
" <th>CC Number</th>\n",
" <th>Payment ID</th>\n",
" <th>last_four</th>\n",
" <th>Expensive</th>\n",
" <th>Tip Quality</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>170</th>\n",
" <td>50.81</td>\n",
" <td>10.00</td>\n",
" <td>Male</td>\n",
" <td>Yes</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>16.94</td>\n",
" <td>Gregory Clark</td>\n",
" <td>5473850968388236</td>\n",
" <td>Sat1954</td>\n",
" <td>8236</td>\n",
" <td>$$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>212</th>\n",
" <td>48.33</td>\n",
" <td>9.00</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>4</td>\n",
" <td>12.08</td>\n",
" <td>Alex Williamson</td>\n",
" <td>676218815212</td>\n",
" <td>Sat4590</td>\n",
" <td>5212</td>\n",
" <td>$$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>39.42</td>\n",
" <td>7.58</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>4</td>\n",
" <td>9.86</td>\n",
" <td>Lance Peterson</td>\n",
" <td>3542584061609808</td>\n",
" <td>Sat239</td>\n",
" <td>9808</td>\n",
" <td>$$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>59</th>\n",
" <td>48.27</td>\n",
" <td>6.73</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>4</td>\n",
" <td>12.07</td>\n",
" <td>Brian Ortiz</td>\n",
" <td>6596453823950595</td>\n",
" <td>Sat8139</td>\n",
" <td>0595</td>\n",
" <td>$$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>141</th>\n",
" <td>34.30</td>\n",
" <td>6.70</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Thur</td>\n",
" <td>Lunch</td>\n",
" <td>6</td>\n",
" <td>5.72</td>\n",
" <td>Steven Carlson</td>\n",
" <td>3526515703718508</td>\n",
" <td>Thur1025</td>\n",
" <td>8508</td>\n",
" <td>$$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>183</th>\n",
" <td>23.17</td>\n",
" <td>6.50</td>\n",
" <td>Male</td>\n",
" <td>Yes</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>4</td>\n",
" <td>5.79</td>\n",
" <td>Dr. Michael James</td>\n",
" <td>4718501859162</td>\n",
" <td>Sun6059</td>\n",
" <td>9162</td>\n",
" <td>$$</td>\n",
" <td>Generous</td>\n",
" </tr>\n",
" <tr>\n",
" <th>214</th>\n",
" <td>28.17</td>\n",
" <td>6.50</td>\n",
" <td>Female</td>\n",
" <td>Yes</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>9.39</td>\n",
" <td>Marissa Jackson</td>\n",
" <td>4922302538691962</td>\n",
" <td>Sat3374</td>\n",
" <td>1962</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>47</th>\n",
" <td>32.40</td>\n",
" <td>6.00</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sun</td>\n",
" <td>Dinner</td>\n",
" <td>4</td>\n",
" <td>8.10</td>\n",
" <td>James Barnes</td>\n",
" <td>3552002592874186</td>\n",
" <td>Sun9677</td>\n",
" <td>4186</td>\n",
" <td>$$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>239</th>\n",
" <td>29.03</td>\n",
" <td>5.92</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Sat</td>\n",
" <td>Dinner</td>\n",
" <td>3</td>\n",
" <td>9.68</td>\n",
" <td>Michael Avila</td>\n",
" <td>5296068606052842</td>\n",
" <td>Sat2657</td>\n",
" <td>2842</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" <tr>\n",
" <th>88</th>\n",
" <td>24.71</td>\n",
" <td>5.85</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>Thur</td>\n",
" <td>Lunch</td>\n",
" <td>2</td>\n",
" <td>12.36</td>\n",
" <td>Roger Taylor</td>\n",
" <td>4410248629955</td>\n",
" <td>Thur9003</td>\n",
" <td>9955</td>\n",
" <td>$$</td>\n",
" <td>Ok</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" total_bill tip sex smoker day time size price_per_person \\\n",
"170 50.81 10.00 Male Yes Sat Dinner 3 16.94 \n",
"212 48.33 9.00 Male No Sat Dinner 4 12.08 \n",
"23 39.42 7.58 Male No Sat Dinner 4 9.86 \n",
"59 48.27 6.73 Male No Sat Dinner 4 12.07 \n",
"141 34.30 6.70 Male No Thur Lunch 6 5.72 \n",
"183 23.17 6.50 Male Yes Sun Dinner 4 5.79 \n",
"214 28.17 6.50 Female Yes Sat Dinner 3 9.39 \n",
"47 32.40 6.00 Male No Sun Dinner 4 8.10 \n",
"239 29.03 5.92 Male No Sat Dinner 3 9.68 \n",
"88 24.71 5.85 Male No Thur Lunch 2 12.36 \n",
"\n",
" Payer Name CC Number Payment ID last_four Expensive \\\n",
"170 Gregory Clark 5473850968388236 Sat1954 8236 $$$ \n",
"212 Alex Williamson 676218815212 Sat4590 5212 $$$ \n",
"23 Lance Peterson 3542584061609808 Sat239 9808 $$$ \n",
"59 Brian Ortiz 6596453823950595 Sat8139 0595 $$$ \n",
"141 Steven Carlson 3526515703718508 Thur1025 8508 $$$ \n",
"183 Dr. Michael James 4718501859162 Sun6059 9162 $$ \n",
"214 Marissa Jackson 4922302538691962 Sat3374 1962 $$ \n",
"47 James Barnes 3552002592874186 Sun9677 4186 $$$ \n",
"239 Michael Avila 5296068606052842 Sat2657 2842 $$ \n",
"88 Roger Taylor 4410248629955 Thur9003 9955 $$ \n",
"\n",
" Tip Quality \n",
"170 Ok \n",
"212 Ok \n",
"23 Ok \n",
"59 Ok \n",
"141 Ok \n",
"183 Generous \n",
"214 Ok \n",
"47 Ok \n",
"239 Ok \n",
"88 Ok "
]
},
"execution_count": 71,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.nlargest(10,'tip')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"----"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 1
}