{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "\n", "\n", "___\n", "
Copyright by Pierian Data Inc.
\n", "
For more information, visit us at www.pieriandata.com
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Useful Methods\n", "\n", "Let's cover some useful methods and functions built in to pandas. This is actually just a small sampling of the functions and methods available in Pandas, but they are some of the most commonly used.\n", "The [documentation](https://pandas.pydata.org/pandas-docs/stable/reference/index.html) is a great resource to continue exploring more methods and functions (we will introduce more further along in the course).\n", "Here is a list of functions and methods we'll cover here (click on one to jump to that section in this notebook.):\n", "\n", "* [apply() method](#apply_method)\n", "* [apply() with a function](#apply_function)\n", "* [apply() with a lambda expression](#apply_lambda)\n", "* [apply() on multiple columns](#apply_multiple)\n", "* [describe()](#describe)\n", "* [sort_values()](#sort)\n", "* [corr()](#corr)\n", "* [idxmin and idxmax](#idx)\n", "* [value_counts](#v_c)\n", "* [replace](#replace)\n", "* [unique and nunique](#uni)\n", "* [map](#map)\n", "* [duplicated and drop_duplicates](#dup)\n", "* [between](#bet)\n", "* [sample](#sample)\n", "* [nlargest](#n)\n", "\n", "Make sure to view the video lessons to get the full explanation!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "## The .apply() method\n", "\n", "Here we will learn about a very useful method known as **apply** on a DataFrame. This allows us to apply and broadcast custom functions on a DataFrame column" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df = pd.read_csv('tips.csv')" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
total_billtipsexsmokerdaytimesizeprice_per_personPayer NameCC NumberPayment ID
016.991.01FemaleNoSunDinner28.49Christy Cunningham3560325168603410Sun2959
110.341.66MaleNoSunDinner33.45Douglas Tucker4478071379779230Sun4608
221.013.50MaleNoSunDinner37.00Travis Walters6011812112971322Sun4458
323.683.31MaleNoSunDinner211.84Nathaniel Harris4676137647685994Sun5260
424.593.61FemaleNoSunDinner46.15Tonya Carter4832732618637221Sun2251
\n", "
" ], "text/plain": [ " total_bill tip sex smoker day time size price_per_person \\\n", "0 16.99 1.01 Female No Sun Dinner 2 8.49 \n", "1 10.34 1.66 Male No Sun Dinner 3 3.45 \n", "2 21.01 3.50 Male No Sun Dinner 3 7.00 \n", "3 23.68 3.31 Male No Sun Dinner 2 11.84 \n", "4 24.59 3.61 Female No Sun Dinner 4 6.15 \n", "\n", " Payer Name CC Number Payment ID \n", "0 Christy Cunningham 3560325168603410 Sun2959 \n", "1 Douglas Tucker 4478071379779230 Sun4608 \n", "2 Travis Walters 6011812112971322 Sun4458 \n", "3 Nathaniel Harris 4676137647685994 Sun5260 \n", "4 Tonya Carter 4832732618637221 Sun2251 " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### apply with a function" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 244 entries, 0 to 243\n", "Data columns (total 11 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 total_bill 244 non-null float64\n", " 1 tip 244 non-null float64\n", " 2 sex 244 non-null object \n", " 3 smoker 244 non-null object \n", " 4 day 244 non-null object \n", " 5 time 244 non-null object \n", " 6 size 244 non-null int64 \n", " 7 price_per_person 244 non-null float64\n", " 8 Payer Name 244 non-null object \n", " 9 CC Number 244 non-null int64 \n", " 10 Payment ID 244 non-null object \n", "dtypes: float64(3), int64(2), object(6)\n", "memory usage: 21.1+ KB\n" ] } ], "source": [ "df.info()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def last_four(num):\n", " return str(num)[-4:]" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3560325168603410" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['CC Number'][0]" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'3410'" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "last_four(3560325168603410)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df['last_four'] = df['CC Number'].apply(last_four)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
total_billtipsexsmokerdaytimesizeprice_per_personPayer NameCC NumberPayment IDlast_four
016.991.01FemaleNoSunDinner28.49Christy Cunningham3560325168603410Sun29593410
110.341.66MaleNoSunDinner33.45Douglas Tucker4478071379779230Sun46089230
221.013.50MaleNoSunDinner37.00Travis Walters6011812112971322Sun44581322
323.683.31MaleNoSunDinner211.84Nathaniel Harris4676137647685994Sun52605994
424.593.61FemaleNoSunDinner46.15Tonya Carter4832732618637221Sun22517221
\n", "
" ], "text/plain": [ " total_bill tip sex smoker day time size price_per_person \\\n", "0 16.99 1.01 Female No Sun Dinner 2 8.49 \n", "1 10.34 1.66 Male No Sun Dinner 3 3.45 \n", "2 21.01 3.50 Male No Sun Dinner 3 7.00 \n", "3 23.68 3.31 Male No Sun Dinner 2 11.84 \n", "4 24.59 3.61 Female No Sun Dinner 4 6.15 \n", "\n", " Payer Name CC Number Payment ID last_four \n", "0 Christy Cunningham 3560325168603410 Sun2959 3410 \n", "1 Douglas Tucker 4478071379779230 Sun4608 9230 \n", "2 Travis Walters 6011812112971322 Sun4458 1322 \n", "3 Nathaniel Harris 4676137647685994 Sun5260 5994 \n", "4 Tonya Carter 4832732618637221 Sun2251 7221 " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using .apply() with more complex functions" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "19.78594262295082" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['total_bill'].mean()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def yelp(price):\n", " if price < 10:\n", " return '$'\n", " elif price >= 10 and price < 30:\n", " return '$$'\n", " else:\n", " return '$$$'" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df['Expensive'] = df['total_bill'].apply(yelp)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### apply with lambda" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def simple(num):\n", " return num*2" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(num)>" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lambda num: num*2" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 3.0582\n", "1 1.8612\n", "2 3.7818\n", "3 4.2624\n", "4 4.4262\n", " ... \n", "239 5.2254\n", "240 4.8924\n", "241 4.0806\n", "242 3.2076\n", "243 3.3804\n", "Name: total_bill, Length: 244, dtype: float64" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['total_bill'].apply(lambda bill:bill*0.18)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## apply that uses multiple columns\n", "\n", "Note, there are several ways to do this:\n", "\n", "https://stackoverflow.com/questions/19914937/applying-function-with-multiple-arguments-to-create-a-new-pandas-column" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
total_billtipsexsmokerdaytimesizeprice_per_personPayer NameCC NumberPayment IDlast_fourExpensive
016.991.01FemaleNoSunDinner28.49Christy Cunningham3560325168603410Sun29593410$$
110.341.66MaleNoSunDinner33.45Douglas Tucker4478071379779230Sun46089230$$
221.013.50MaleNoSunDinner37.00Travis Walters6011812112971322Sun44581322$$
323.683.31MaleNoSunDinner211.84Nathaniel Harris4676137647685994Sun52605994$$
424.593.61FemaleNoSunDinner46.15Tonya Carter4832732618637221Sun22517221$$
\n", "
" ], "text/plain": [ " total_bill tip sex smoker day time size price_per_person \\\n", "0 16.99 1.01 Female No Sun Dinner 2 8.49 \n", "1 10.34 1.66 Male No Sun Dinner 3 3.45 \n", "2 21.01 3.50 Male No Sun Dinner 3 7.00 \n", "3 23.68 3.31 Male No Sun Dinner 2 11.84 \n", "4 24.59 3.61 Female No Sun Dinner 4 6.15 \n", "\n", " Payer Name CC Number Payment ID last_four Expensive \n", "0 Christy Cunningham 3560325168603410 Sun2959 3410 $$ \n", "1 Douglas Tucker 4478071379779230 Sun4608 9230 $$ \n", "2 Travis Walters 6011812112971322 Sun4458 1322 $$ \n", "3 Nathaniel Harris 4676137647685994 Sun5260 5994 $$ \n", "4 Tonya Carter 4832732618637221 Sun2251 7221 $$ " ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def quality(total_bill,tip):\n", " if tip/total_bill > 0.25:\n", " return \"Generous\"\n", " else:\n", " return \"Other\"" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df['Tip Quality'] = df[['total_bill','tip']].apply(lambda df: quality(df['total_bill'],df['tip']),axis=1)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
total_billtipsexsmokerdaytimesizeprice_per_personPayer NameCC NumberPayment IDlast_fourExpensiveTip Quality
016.991.01FemaleNoSunDinner28.49Christy Cunningham3560325168603410Sun29593410$$Other
110.341.66MaleNoSunDinner33.45Douglas Tucker4478071379779230Sun46089230$$Other
221.013.50MaleNoSunDinner37.00Travis Walters6011812112971322Sun44581322$$Other
323.683.31MaleNoSunDinner211.84Nathaniel Harris4676137647685994Sun52605994$$Other
424.593.61FemaleNoSunDinner46.15Tonya Carter4832732618637221Sun22517221$$Other
\n", "
" ], "text/plain": [ " total_bill tip sex smoker day time size price_per_person \\\n", "0 16.99 1.01 Female No Sun Dinner 2 8.49 \n", "1 10.34 1.66 Male No Sun Dinner 3 3.45 \n", "2 21.01 3.50 Male No Sun Dinner 3 7.00 \n", "3 23.68 3.31 Male No Sun Dinner 2 11.84 \n", "4 24.59 3.61 Female No Sun Dinner 4 6.15 \n", "\n", " Payer Name CC Number Payment ID last_four Expensive \\\n", "0 Christy Cunningham 3560325168603410 Sun2959 3410 $$ \n", "1 Douglas Tucker 4478071379779230 Sun4608 9230 $$ \n", "2 Travis Walters 6011812112971322 Sun4458 1322 $$ \n", "3 Nathaniel Harris 4676137647685994 Sun5260 5994 $$ \n", "4 Tonya Carter 4832732618637221 Sun2251 7221 $$ \n", "\n", " Tip Quality \n", "0 Other \n", "1 Other \n", "2 Other \n", "3 Other \n", "4 Other " ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df['Tip Quality'] = np.vectorize(quality)(df['total_bill'], df['tip'])" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
total_billtipsexsmokerdaytimesizeprice_per_personPayer NameCC NumberPayment IDlast_fourExpensiveTip Quality
016.991.01FemaleNoSunDinner28.49Christy Cunningham3560325168603410Sun29593410$$Other
110.341.66MaleNoSunDinner33.45Douglas Tucker4478071379779230Sun46089230$$Other
221.013.50MaleNoSunDinner37.00Travis Walters6011812112971322Sun44581322$$Other
323.683.31MaleNoSunDinner211.84Nathaniel Harris4676137647685994Sun52605994$$Other
424.593.61FemaleNoSunDinner46.15Tonya Carter4832732618637221Sun22517221$$Other
\n", "
" ], "text/plain": [ " total_bill tip sex smoker day time size price_per_person \\\n", "0 16.99 1.01 Female No Sun Dinner 2 8.49 \n", "1 10.34 1.66 Male No Sun Dinner 3 3.45 \n", "2 21.01 3.50 Male No Sun Dinner 3 7.00 \n", "3 23.68 3.31 Male No Sun Dinner 2 11.84 \n", "4 24.59 3.61 Female No Sun Dinner 4 6.15 \n", "\n", " Payer Name CC Number Payment ID last_four Expensive \\\n", "0 Christy Cunningham 3560325168603410 Sun2959 3410 $$ \n", "1 Douglas Tucker 4478071379779230 Sun4608 9230 $$ \n", "2 Travis Walters 6011812112971322 Sun4458 1322 $$ \n", "3 Nathaniel Harris 4676137647685994 Sun5260 5994 $$ \n", "4 Tonya Carter 4832732618637221 Sun2251 7221 $$ \n", "\n", " Tip Quality \n", "0 Other \n", "1 Other \n", "2 Other \n", "3 Other \n", "4 Other " ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, which one is faster?" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import timeit \n", " \n", "# code snippet to be executed only once \n", "setup = '''\n", "import numpy as np\n", "import pandas as pd\n", "df = pd.read_csv('tips.csv')\n", "def quality(total_bill,tip):\n", " if tip/total_bill > 0.25:\n", " return \"Generous\"\n", " else:\n", " return \"Other\"\n", "'''\n", " \n", "# code snippet whose execution time is to be measured \n", "stmt_one = ''' \n", "df['Tip Quality'] = df[['total_bill','tip']].apply(lambda df: quality(df['total_bill'],df['tip']),axis=1)\n", "'''\n", "\n", "stmt_two = '''\n", "df['Tip Quality'] = np.vectorize(quality)(df['total_bill'], df['tip'])\n", "'''\n", " " ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5.0198852999999986" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "timeit.timeit(setup = setup, \n", " stmt = stmt_one, \n", " number = 1000) " ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.21840849999999534" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "timeit.timeit(setup = setup, \n", " stmt = stmt_two, \n", " number = 1000) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Wow! Vectorization is much faster! Keep **np.vectorize()** in mind for the future.\n", "\n", "Full Details:\n", "https://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### df.describe for statistical summaries" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
total_billtipsizeprice_per_personCC Number
count244.000000244.000000244.000000244.0000002.440000e+02
mean19.7859432.9982792.5696727.8881972.563496e+15
std8.9024121.3836380.9511002.9142342.369340e+15
min3.0700001.0000001.0000002.8800006.040679e+10
25%13.3475002.0000002.0000005.8000003.040731e+13
50%17.7950002.9000002.0000007.2550003.525318e+15
75%24.1275003.5625003.0000009.3900004.553675e+15
max50.81000010.0000006.00000020.2700006.596454e+15
\n", "
" ], "text/plain": [ " total_bill tip size price_per_person CC Number\n", "count 244.000000 244.000000 244.000000 244.000000 2.440000e+02\n", "mean 19.785943 2.998279 2.569672 7.888197 2.563496e+15\n", "std 8.902412 1.383638 0.951100 2.914234 2.369340e+15\n", "min 3.070000 1.000000 1.000000 2.880000 6.040679e+10\n", "25% 13.347500 2.000000 2.000000 5.800000 3.040731e+13\n", "50% 17.795000 2.900000 2.000000 7.255000 3.525318e+15\n", "75% 24.127500 3.562500 3.000000 9.390000 4.553675e+15\n", "max 50.810000 10.000000 6.000000 20.270000 6.596454e+15" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.describe()" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
total_bill244.01.978594e+018.902412e+003.070000e+001.334750e+011.779500e+012.412750e+015.081000e+01
tip244.02.998279e+001.383638e+001.000000e+002.000000e+002.900000e+003.562500e+001.000000e+01
size244.02.569672e+009.510998e-011.000000e+002.000000e+002.000000e+003.000000e+006.000000e+00
price_per_person244.07.888197e+002.914234e+002.880000e+005.800000e+007.255000e+009.390000e+002.027000e+01
CC Number244.02.563496e+152.369340e+156.040679e+103.040731e+133.525318e+154.553675e+156.596454e+15
\n", "
" ], "text/plain": [ " count mean std min \\\n", "total_bill 244.0 1.978594e+01 8.902412e+00 3.070000e+00 \n", "tip 244.0 2.998279e+00 1.383638e+00 1.000000e+00 \n", "size 244.0 2.569672e+00 9.510998e-01 1.000000e+00 \n", "price_per_person 244.0 7.888197e+00 2.914234e+00 2.880000e+00 \n", "CC Number 244.0 2.563496e+15 2.369340e+15 6.040679e+10 \n", "\n", " 25% 50% 75% max \n", "total_bill 1.334750e+01 1.779500e+01 2.412750e+01 5.081000e+01 \n", "tip 2.000000e+00 2.900000e+00 3.562500e+00 1.000000e+01 \n", "size 2.000000e+00 2.000000e+00 3.000000e+00 6.000000e+00 \n", "price_per_person 5.800000e+00 7.255000e+00 9.390000e+00 2.027000e+01 \n", "CC Number 3.040731e+13 3.525318e+15 4.553675e+15 6.596454e+15 " ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.describe().transpose()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### sort_values()" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
total_billtipsexsmokerdaytimesizeprice_per_personPayer NameCC NumberPayment IDlast_fourExpensiveTip Quality
673.071.00FemaleYesSatDinner13.07Tiffany Brock4359488526995267Sat34555267$Generous
23612.601.00MaleYesSatDinner26.30Matthew Myers3543676378973965Sat50323965$$Other
925.751.00FemaleYesFriDinner22.88Leah Ramirez3508911676966392Fri37806392$Other
1117.251.00FemaleNoSatDinner17.25Terri Jones3559221007826887Sat48016887$Other
016.991.01FemaleNoSunDinner28.49Christy Cunningham3560325168603410Sun29593410$$Other
.............................................
14134.306.70MaleNoThurLunch65.72Steven Carlson3526515703718508Thur10258508$$$Other
5948.276.73MaleNoSatDinner412.07Brian Ortiz6596453823950595Sat81390595$$$Other
2339.427.58MaleNoSatDinner49.86Lance Peterson3542584061609808Sat2399808$$$Other
21248.339.00MaleNoSatDinner412.08Alex Williamson676218815212Sat45905212$$$Other
17050.8110.00MaleYesSatDinner316.94Gregory Clark5473850968388236Sat19548236$$$Other
\n", "

244 rows × 14 columns

\n", "
" ], "text/plain": [ " total_bill tip sex smoker day time size price_per_person \\\n", "67 3.07 1.00 Female Yes Sat Dinner 1 3.07 \n", "236 12.60 1.00 Male Yes Sat Dinner 2 6.30 \n", "92 5.75 1.00 Female Yes Fri Dinner 2 2.88 \n", "111 7.25 1.00 Female No Sat Dinner 1 7.25 \n", "0 16.99 1.01 Female No Sun Dinner 2 8.49 \n", ".. ... ... ... ... ... ... ... ... \n", "141 34.30 6.70 Male No Thur Lunch 6 5.72 \n", "59 48.27 6.73 Male No Sat Dinner 4 12.07 \n", "23 39.42 7.58 Male No Sat Dinner 4 9.86 \n", "212 48.33 9.00 Male No Sat Dinner 4 12.08 \n", "170 50.81 10.00 Male Yes Sat Dinner 3 16.94 \n", "\n", " Payer Name CC Number Payment ID last_four Expensive \\\n", "67 Tiffany Brock 4359488526995267 Sat3455 5267 $ \n", "236 Matthew Myers 3543676378973965 Sat5032 3965 $$ \n", "92 Leah Ramirez 3508911676966392 Fri3780 6392 $ \n", "111 Terri Jones 3559221007826887 Sat4801 6887 $ \n", "0 Christy Cunningham 3560325168603410 Sun2959 3410 $$ \n", ".. ... ... ... ... ... \n", "141 Steven Carlson 3526515703718508 Thur1025 8508 $$$ \n", "59 Brian Ortiz 6596453823950595 Sat8139 0595 $$$ \n", "23 Lance Peterson 3542584061609808 Sat239 9808 $$$ \n", "212 Alex Williamson 676218815212 Sat4590 5212 $$$ \n", "170 Gregory Clark 5473850968388236 Sat1954 8236 $$$ \n", "\n", " Tip Quality \n", "67 Generous \n", "236 Other \n", "92 Other \n", "111 Other \n", "0 Other \n", ".. ... \n", "141 Other \n", "59 Other \n", "23 Other \n", "212 Other \n", "170 Other \n", "\n", "[244 rows x 14 columns]" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sort_values('tip')" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
total_billtipsexsmokerdaytimesizeprice_per_personPayer NameCC NumberPayment IDlast_fourExpensiveTip Quality
673.071.00FemaleYesSatDinner13.07Tiffany Brock4359488526995267Sat34555267$Generous
1117.251.00FemaleNoSatDinner17.25Terri Jones3559221007826887Sat48016887$Other
925.751.00FemaleYesFriDinner22.88Leah Ramirez3508911676966392Fri37806392$Other
23612.601.00MaleYesSatDinner26.30Matthew Myers3543676378973965Sat50323965$$Other
016.991.01FemaleNoSunDinner28.49Christy Cunningham3560325168603410Sun29593410$$Other
.............................................
14134.306.70MaleNoThurLunch65.72Steven Carlson3526515703718508Thur10258508$$$Other
5948.276.73MaleNoSatDinner412.07Brian Ortiz6596453823950595Sat81390595$$$Other
2339.427.58MaleNoSatDinner49.86Lance Peterson3542584061609808Sat2399808$$$Other
21248.339.00MaleNoSatDinner412.08Alex Williamson676218815212Sat45905212$$$Other
17050.8110.00MaleYesSatDinner316.94Gregory Clark5473850968388236Sat19548236$$$Other
\n", "

244 rows × 14 columns

\n", "
" ], "text/plain": [ " total_bill tip sex smoker day time size price_per_person \\\n", "67 3.07 1.00 Female Yes Sat Dinner 1 3.07 \n", "111 7.25 1.00 Female No Sat Dinner 1 7.25 \n", "92 5.75 1.00 Female Yes Fri Dinner 2 2.88 \n", "236 12.60 1.00 Male Yes Sat Dinner 2 6.30 \n", "0 16.99 1.01 Female No Sun Dinner 2 8.49 \n", ".. ... ... ... ... ... ... ... ... \n", "141 34.30 6.70 Male No Thur Lunch 6 5.72 \n", "59 48.27 6.73 Male No Sat Dinner 4 12.07 \n", "23 39.42 7.58 Male No Sat Dinner 4 9.86 \n", "212 48.33 9.00 Male No Sat Dinner 4 12.08 \n", "170 50.81 10.00 Male Yes Sat Dinner 3 16.94 \n", "\n", " Payer Name CC Number Payment ID last_four Expensive \\\n", "67 Tiffany Brock 4359488526995267 Sat3455 5267 $ \n", "111 Terri Jones 3559221007826887 Sat4801 6887 $ \n", "92 Leah Ramirez 3508911676966392 Fri3780 6392 $ \n", "236 Matthew Myers 3543676378973965 Sat5032 3965 $$ \n", "0 Christy Cunningham 3560325168603410 Sun2959 3410 $$ \n", ".. ... ... ... ... ... \n", "141 Steven Carlson 3526515703718508 Thur1025 8508 $$$ \n", "59 Brian Ortiz 6596453823950595 Sat8139 0595 $$$ \n", "23 Lance Peterson 3542584061609808 Sat239 9808 $$$ \n", "212 Alex Williamson 676218815212 Sat4590 5212 $$$ \n", "170 Gregory Clark 5473850968388236 Sat1954 8236 $$$ \n", "\n", " Tip Quality \n", "67 Generous \n", "111 Other \n", "92 Other \n", "236 Other \n", "0 Other \n", ".. ... \n", "141 Other \n", "59 Other \n", "23 Other \n", "212 Other \n", "170 Other \n", "\n", "[244 rows x 14 columns]" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Helpful if you want to reorder after a sort\n", "# https://stackoverflow.com/questions/13148429/how-to-change-the-order-of-dataframe-columns\n", "df.sort_values(['tip','size'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## df.corr() for correlation checks\n", "\n", "[Wikipedia on Correlation](https://en.wikipedia.org/wiki/Correlation_and_dependence)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
total_billtipsizeprice_per_personCC Number
total_bill1.0000000.6757340.5983150.6475540.104576
tip0.6757341.0000000.4892990.3474050.110857
size0.5983150.4892991.000000-0.175359-0.030239
price_per_person0.6475540.347405-0.1753591.0000000.135240
CC Number0.1045760.110857-0.0302390.1352401.000000
\n", "
" ], "text/plain": [ " total_bill tip size price_per_person CC Number\n", "total_bill 1.000000 0.675734 0.598315 0.647554 0.104576\n", "tip 0.675734 1.000000 0.489299 0.347405 0.110857\n", "size 0.598315 0.489299 1.000000 -0.175359 -0.030239\n", "price_per_person 0.647554 0.347405 -0.175359 1.000000 0.135240\n", "CC Number 0.104576 0.110857 -0.030239 0.135240 1.000000" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.corr()" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
total_billtip
total_bill1.0000000.675734
tip0.6757341.000000
\n", "
" ], "text/plain": [ " total_bill tip\n", "total_bill 1.000000 0.675734\n", "tip 0.675734 1.000000" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[['total_bill','tip']].corr()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### idxmin and idxmax" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
total_billtipsexsmokerdaytimesizeprice_per_personPayer NameCC NumberPayment IDlast_fourExpensiveTip Quality
016.991.01FemaleNoSunDinner28.49Christy Cunningham3560325168603410Sun29593410$$Other
110.341.66MaleNoSunDinner33.45Douglas Tucker4478071379779230Sun46089230$$Other
221.013.50MaleNoSunDinner37.00Travis Walters6011812112971322Sun44581322$$Other
323.683.31MaleNoSunDinner211.84Nathaniel Harris4676137647685994Sun52605994$$Other
424.593.61FemaleNoSunDinner46.15Tonya Carter4832732618637221Sun22517221$$Other
\n", "
" ], "text/plain": [ " total_bill tip sex smoker day time size price_per_person \\\n", "0 16.99 1.01 Female No Sun Dinner 2 8.49 \n", "1 10.34 1.66 Male No Sun Dinner 3 3.45 \n", "2 21.01 3.50 Male No Sun Dinner 3 7.00 \n", "3 23.68 3.31 Male No Sun Dinner 2 11.84 \n", "4 24.59 3.61 Female No Sun Dinner 4 6.15 \n", "\n", " Payer Name CC Number Payment ID last_four Expensive \\\n", "0 Christy Cunningham 3560325168603410 Sun2959 3410 $$ \n", "1 Douglas Tucker 4478071379779230 Sun4608 9230 $$ \n", "2 Travis Walters 6011812112971322 Sun4458 1322 $$ \n", "3 Nathaniel Harris 4676137647685994 Sun5260 5994 $$ \n", "4 Tonya Carter 4832732618637221 Sun2251 7221 $$ \n", "\n", " Tip Quality \n", "0 Other \n", "1 Other \n", "2 Other \n", "3 Other \n", "4 Other " ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "50.81" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['total_bill'].max()" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "170" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['total_bill'].idxmax()" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "67" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['total_bill'].idxmin()" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "total_bill 3.07\n", "tip 1\n", "sex Female\n", "smoker Yes\n", "day Sat\n", "time Dinner\n", "size 1\n", "price_per_person 3.07\n", "Payer Name Tiffany Brock\n", "CC Number 4359488526995267\n", "Payment ID Sat3455\n", "last_four 5267\n", "Expensive $\n", "Tip Quality Generous\n", "Name: 67, dtype: object" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.iloc[67]" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "total_bill 50.81\n", "tip 10\n", "sex Male\n", "smoker Yes\n", "day Sat\n", "time Dinner\n", "size 3\n", "price_per_person 16.94\n", "Payer Name Gregory Clark\n", "CC Number 5473850968388236\n", "Payment ID Sat1954\n", "last_four 8236\n", "Expensive $$$\n", "Tip Quality Other\n", "Name: 170, dtype: object" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.iloc[170]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### value_counts\n", "\n", "Nice method to quickly get a count per category. Only makes sense on categorical columns." ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
total_billtipsexsmokerdaytimesizeprice_per_personPayer NameCC NumberPayment IDlast_fourExpensiveTip Quality
016.991.01FemaleNoSunDinner28.49Christy Cunningham3560325168603410Sun29593410$$Other
110.341.66MaleNoSunDinner33.45Douglas Tucker4478071379779230Sun46089230$$Other
221.013.50MaleNoSunDinner37.00Travis Walters6011812112971322Sun44581322$$Other
323.683.31MaleNoSunDinner211.84Nathaniel Harris4676137647685994Sun52605994$$Other
424.593.61FemaleNoSunDinner46.15Tonya Carter4832732618637221Sun22517221$$Other
\n", "
" ], "text/plain": [ " total_bill tip sex smoker day time size price_per_person \\\n", "0 16.99 1.01 Female No Sun Dinner 2 8.49 \n", "1 10.34 1.66 Male No Sun Dinner 3 3.45 \n", "2 21.01 3.50 Male No Sun Dinner 3 7.00 \n", "3 23.68 3.31 Male No Sun Dinner 2 11.84 \n", "4 24.59 3.61 Female No Sun Dinner 4 6.15 \n", "\n", " Payer Name CC Number Payment ID last_four Expensive \\\n", "0 Christy Cunningham 3560325168603410 Sun2959 3410 $$ \n", "1 Douglas Tucker 4478071379779230 Sun4608 9230 $$ \n", "2 Travis Walters 6011812112971322 Sun4458 1322 $$ \n", "3 Nathaniel Harris 4676137647685994 Sun5260 5994 $$ \n", "4 Tonya Carter 4832732618637221 Sun2251 7221 $$ \n", "\n", " Tip Quality \n", "0 Other \n", "1 Other \n", "2 Other \n", "3 Other \n", "4 Other " ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Male 157\n", "Female 87\n", "Name: sex, dtype: int64" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['sex'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "### replace\n", "\n", "Quickly replace values with another one." ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
total_billtipsexsmokerdaytimesizeprice_per_personPayer NameCC NumberPayment IDlast_fourExpensiveTip Quality
016.991.01FemaleNoSunDinner28.49Christy Cunningham3560325168603410Sun29593410$$Other
110.341.66MaleNoSunDinner33.45Douglas Tucker4478071379779230Sun46089230$$Other
221.013.50MaleNoSunDinner37.00Travis Walters6011812112971322Sun44581322$$Other
323.683.31MaleNoSunDinner211.84Nathaniel Harris4676137647685994Sun52605994$$Other
424.593.61FemaleNoSunDinner46.15Tonya Carter4832732618637221Sun22517221$$Other
\n", "
" ], "text/plain": [ " total_bill tip sex smoker day time size price_per_person \\\n", "0 16.99 1.01 Female No Sun Dinner 2 8.49 \n", "1 10.34 1.66 Male No Sun Dinner 3 3.45 \n", "2 21.01 3.50 Male No Sun Dinner 3 7.00 \n", "3 23.68 3.31 Male No Sun Dinner 2 11.84 \n", "4 24.59 3.61 Female No Sun Dinner 4 6.15 \n", "\n", " Payer Name CC Number Payment ID last_four Expensive \\\n", "0 Christy Cunningham 3560325168603410 Sun2959 3410 $$ \n", "1 Douglas Tucker 4478071379779230 Sun4608 9230 $$ \n", "2 Travis Walters 6011812112971322 Sun4458 1322 $$ \n", "3 Nathaniel Harris 4676137647685994 Sun5260 5994 $$ \n", "4 Tonya Carter 4832732618637221 Sun2251 7221 $$ \n", "\n", " Tip Quality \n", "0 Other \n", "1 Other \n", "2 Other \n", "3 Other \n", "4 Other " ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 Ok\n", "1 Ok\n", "2 Ok\n", "3 Ok\n", "4 Ok\n", "5 Ok\n", "6 Ok\n", "7 Ok\n", "8 Ok\n", "9 Ok\n", "10 Ok\n", "11 Ok\n", "12 Ok\n", "13 Ok\n", "14 Ok\n", "15 Ok\n", "16 Ok\n", "17 Ok\n", "18 Ok\n", "19 Ok\n", "20 Ok\n", "21 Ok\n", "22 Ok\n", "23 Ok\n", "24 Ok\n", "25 Ok\n", "26 Ok\n", "27 Ok\n", "28 Ok\n", "29 Ok\n", " ... \n", "214 Ok\n", "215 Ok\n", "216 Ok\n", "217 Ok\n", "218 Ok\n", "219 Ok\n", "220 Ok\n", "221 Generous\n", "222 Ok\n", "223 Ok\n", "224 Ok\n", "225 Ok\n", "226 Ok\n", "227 Ok\n", "228 Ok\n", "229 Ok\n", "230 Ok\n", "231 Ok\n", "232 Generous\n", "233 Ok\n", "234 Ok\n", "235 Ok\n", "236 Ok\n", "237 Ok\n", "238 Ok\n", "239 Ok\n", "240 Ok\n", "241 Ok\n", "242 Ok\n", "243 Ok\n", "Name: Tip Quality, Length: 244, dtype: object" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['Tip Quality'].replace(to_replace='Other',value='Ok')" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df['Tip Quality'] = df['Tip Quality'].replace(to_replace='Other',value='Ok')" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
total_billtipsexsmokerdaytimesizeprice_per_personPayer NameCC NumberPayment IDlast_fourExpensiveTip Quality
016.991.01FemaleNoSunDinner28.49Christy Cunningham3560325168603410Sun29593410$$Ok
110.341.66MaleNoSunDinner33.45Douglas Tucker4478071379779230Sun46089230$$Ok
221.013.50MaleNoSunDinner37.00Travis Walters6011812112971322Sun44581322$$Ok
323.683.31MaleNoSunDinner211.84Nathaniel Harris4676137647685994Sun52605994$$Ok
424.593.61FemaleNoSunDinner46.15Tonya Carter4832732618637221Sun22517221$$Ok
\n", "
" ], "text/plain": [ " total_bill tip sex smoker day time size price_per_person \\\n", "0 16.99 1.01 Female No Sun Dinner 2 8.49 \n", "1 10.34 1.66 Male No Sun Dinner 3 3.45 \n", "2 21.01 3.50 Male No Sun Dinner 3 7.00 \n", "3 23.68 3.31 Male No Sun Dinner 2 11.84 \n", "4 24.59 3.61 Female No Sun Dinner 4 6.15 \n", "\n", " Payer Name CC Number Payment ID last_four Expensive \\\n", "0 Christy Cunningham 3560325168603410 Sun2959 3410 $$ \n", "1 Douglas Tucker 4478071379779230 Sun4608 9230 $$ \n", "2 Travis Walters 6011812112971322 Sun4458 1322 $$ \n", "3 Nathaniel Harris 4676137647685994 Sun5260 5994 $$ \n", "4 Tonya Carter 4832732618637221 Sun2251 7221 $$ \n", "\n", " Tip Quality \n", "0 Ok \n", "1 Ok \n", "2 Ok \n", "3 Ok \n", "4 Ok " ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### unique" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([2, 3, 4, 1, 6, 5], dtype=int64)" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['size'].unique()" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "6" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['size'].nunique()" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['Dinner', 'Lunch'], dtype=object)" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['time'].unique()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### map" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "collapsed": true }, "outputs": [], "source": [ "my_map = {'Dinner':'D','Lunch':'L'}" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 D\n", "1 D\n", "2 D\n", "3 D\n", "4 D\n", "5 D\n", "6 D\n", "7 D\n", "8 D\n", "9 D\n", "10 D\n", "11 D\n", "12 D\n", "13 D\n", "14 D\n", "15 D\n", "16 D\n", "17 D\n", "18 D\n", "19 D\n", "20 D\n", "21 D\n", "22 D\n", "23 D\n", "24 D\n", "25 D\n", "26 D\n", "27 D\n", "28 D\n", "29 D\n", " ..\n", "214 D\n", "215 D\n", "216 D\n", "217 D\n", "218 D\n", "219 D\n", "220 L\n", "221 L\n", "222 L\n", "223 L\n", "224 L\n", "225 L\n", "226 L\n", "227 D\n", "228 D\n", "229 D\n", "230 D\n", "231 D\n", "232 D\n", "233 D\n", "234 D\n", "235 D\n", "236 D\n", "237 D\n", "238 D\n", "239 D\n", "240 D\n", "241 D\n", "242 D\n", "243 D\n", "Name: time, Length: 244, dtype: object" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['time'].map(my_map)" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
total_billtipsexsmokerdaytimesizeprice_per_personPayer NameCC NumberPayment IDlast_fourExpensiveTip Quality
016.991.01FemaleNoSunDinner28.49Christy Cunningham3560325168603410Sun29593410$$Ok
110.341.66MaleNoSunDinner33.45Douglas Tucker4478071379779230Sun46089230$$Ok
221.013.50MaleNoSunDinner37.00Travis Walters6011812112971322Sun44581322$$Ok
323.683.31MaleNoSunDinner211.84Nathaniel Harris4676137647685994Sun52605994$$Ok
424.593.61FemaleNoSunDinner46.15Tonya Carter4832732618637221Sun22517221$$Ok
\n", "
" ], "text/plain": [ " total_bill tip sex smoker day time size price_per_person \\\n", "0 16.99 1.01 Female No Sun Dinner 2 8.49 \n", "1 10.34 1.66 Male No Sun Dinner 3 3.45 \n", "2 21.01 3.50 Male No Sun Dinner 3 7.00 \n", "3 23.68 3.31 Male No Sun Dinner 2 11.84 \n", "4 24.59 3.61 Female No Sun Dinner 4 6.15 \n", "\n", " Payer Name CC Number Payment ID last_four Expensive \\\n", "0 Christy Cunningham 3560325168603410 Sun2959 3410 $$ \n", "1 Douglas Tucker 4478071379779230 Sun4608 9230 $$ \n", "2 Travis Walters 6011812112971322 Sun4458 1322 $$ \n", "3 Nathaniel Harris 4676137647685994 Sun5260 5994 $$ \n", "4 Tonya Carter 4832732618637221 Sun2251 7221 $$ \n", "\n", " Tip Quality \n", "0 Ok \n", "1 Ok \n", "2 Ok \n", "3 Ok \n", "4 Ok " ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Duplicates\n", "\n", "### .duplicated() and .drop_duplicates()" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 False\n", "1 False\n", "2 False\n", "3 False\n", "4 False\n", "5 False\n", "6 False\n", "7 False\n", "8 False\n", "9 False\n", "10 False\n", "11 False\n", "12 False\n", "13 False\n", "14 False\n", "15 False\n", "16 False\n", "17 False\n", "18 False\n", "19 False\n", "20 False\n", "21 False\n", "22 False\n", "23 False\n", "24 False\n", "25 False\n", "26 False\n", "27 False\n", "28 False\n", "29 False\n", " ... \n", "214 False\n", "215 False\n", "216 False\n", "217 False\n", "218 False\n", "219 False\n", "220 False\n", "221 False\n", "222 False\n", "223 False\n", "224 False\n", "225 False\n", "226 False\n", "227 False\n", "228 False\n", "229 False\n", "230 False\n", "231 False\n", "232 False\n", "233 False\n", "234 False\n", "235 False\n", "236 False\n", "237 False\n", "238 False\n", "239 False\n", "240 False\n", "241 False\n", "242 False\n", "243 False\n", "Length: 244, dtype: bool" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Returns True for the 1st instance of a duplicated row\n", "df.duplicated()" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "collapsed": true }, "outputs": [], "source": [ "simple_df = pd.DataFrame([1,2,2],['a','b','c'])" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0
a1
b2
c2
\n", "
" ], "text/plain": [ " 0\n", "a 1\n", "b 2\n", "c 2" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "simple_df" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "a False\n", "b False\n", "c True\n", "dtype: bool" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "simple_df.duplicated()" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0
a1
b2
\n", "
" ], "text/plain": [ " 0\n", "a 1\n", "b 2" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "simple_df.drop_duplicates()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## between\n", "\n", "left: A scalar value that defines the left boundary\n", "right: A scalar value that defines the right boundary\n", "inclusive: A Boolean value which is True by default. If False, it excludes the two passed arguments while checking." ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 True\n", "1 True\n", "2 False\n", "3 False\n", "4 False\n", "5 False\n", "6 False\n", "7 False\n", "8 True\n", "9 True\n", "10 True\n", "11 False\n", "12 True\n", "13 True\n", "14 True\n", "15 False\n", "16 True\n", "17 True\n", "18 True\n", "19 False\n", "20 True\n", "21 False\n", "22 True\n", "23 False\n", "24 True\n", "25 True\n", "26 True\n", "27 True\n", "28 False\n", "29 True\n", " ... \n", "214 False\n", "215 True\n", "216 False\n", "217 True\n", "218 False\n", "219 False\n", "220 True\n", "221 True\n", "222 False\n", "223 True\n", "224 True\n", "225 True\n", "226 True\n", "227 False\n", "228 True\n", "229 False\n", "230 False\n", "231 True\n", "232 True\n", "233 True\n", "234 True\n", "235 True\n", "236 True\n", "237 False\n", "238 False\n", "239 False\n", "240 False\n", "241 False\n", "242 True\n", "243 True\n", "Name: total_bill, Length: 244, dtype: bool" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['total_bill'].between(10,20,inclusive=True)" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
total_billtipsexsmokerdaytimesizeprice_per_personPayer NameCC NumberPayment IDlast_fourExpensiveTip Quality
016.991.01FemaleNoSunDinner28.49Christy Cunningham3560325168603410Sun29593410$$Ok
110.341.66MaleNoSunDinner33.45Douglas Tucker4478071379779230Sun46089230$$Ok
815.041.96MaleNoSunDinner27.52Joseph Mcdonald3522866365840377Sun68200377$$Ok
914.783.23MaleNoSunDinner27.39Jerome Abbott3532124519049786Sun37759786$$Ok
1010.271.71MaleNoSunDinner25.14William Riley566287581219Sun25461219$$Ok
1215.421.57MaleNoSunDinner27.71Chad Harrington577040572932Sun13002932$$Ok
1318.433.00MaleNoSunDinner44.61Joshua Jones6011163105616890Sun29716890$$Ok
1414.833.02FemaleNoSunDinner27.42Vanessa Jones30016702287574Sun38487574$$Ok
1610.331.67FemaleNoSunDinner33.44Elizabeth Foster4240025044626033Sun97156033$$Ok
1716.293.71MaleNoSunDinner35.43John Pittman6521340257218708Sun29988708$$Ok
1816.973.50FemaleNoSunDinner35.66Laura Martinez30422275171379Sun27891379$$Ok
2017.924.08MaleNoSatDinner28.96Thomas Rice4403296224639756Sat17099756$$Ok
2215.772.23FemaleNoSatDinner27.88Ashley Shelton3524119516293213Sat97863213$$Ok
2419.823.18MaleNoSatDinner29.91Christopher Ross36739148167928Sat62367928$$Ok
2517.812.34MaleNoSatDinner44.45Robert Perkins30502930499388Sat9079388$$Ok
2613.372.00MaleNoSatDinner26.68Kyle Avery6531339539615499Sat66515499$$Ok
2712.692.00MaleNoSatDinner26.34Patrick Barber30155551880343Sat3940343$$Ok
2919.653.00FemaleNoSatDinner29.82Melinda Murphy5489272944576051Sat24676051$$Ok
3118.352.50MaleNoSatDinner44.59Danny Santiago630415546013Sat49476013$$Ok
3215.063.00FemaleNoSatDinner27.53Amanda Wilson213186304291560Sat13271560$$Ok
3417.783.27MaleNoSatDinner28.89Jacob Castillo3551492000704805Sat81244805$$Ok
3616.312.00MaleNoSatDinner35.44William Ford3527691170179398Sat91399398$$Ok
3716.933.07FemaleNoSatDinner35.64Erin Lewis5161695527390786Sat64060786$$Ok
3818.692.31MaleNoSatDinner36.23Brandon Bradley4427601595688633Sat40568633$$Ok
4016.042.24MaleNoSatDinner35.35Adam Edwards3544447755679420Sat85499420$$Ok
4117.462.54MaleNoSunDinner28.73David Boyer3536678244278149Sun94608149$$Ok
4213.943.06MaleNoSunDinner26.97Bryan Brown36231182760859Sun16990859$$Ok
4518.293.00MaleNoSunDinner29.14Richard Fitzgerald375156610762053Sun86432053$$Ok
4918.043.00MaleNoSunDinner29.02William Roth6573923967142503Sun97742503$$Ok
5012.542.50MaleNoSunDinner26.27Jeremiah Neal2225400829691416Sun20211416$$Ok
.............................................
19119.814.19FemaleYesThurLunch29.90Kristy Boyd4317015327600068Thur9670068$$Ok
19315.482.02MaleYesThurLunch27.74Raymond Sullivan180068856139315Thur6069315$$Ok
19416.584.00MaleYesThurLunch28.29Benjamin Weber676210011505Thur93181505$$Ok
19610.342.00MaleYesThurLunch25.17Eric Martin30442491190342Thur98620342$$Ok
19813.002.00FemaleYesThurLunch26.50Katherine Bond4926725945192Thur4375192$$Ok
19913.512.00MaleYesThurLunch26.76Joseph Murphy MD6547218923471275Thur24281275$$Ok
20018.714.00MaleYesThurLunch36.24Jason Conrad4581233003487Thur60483487$$Ok
20112.742.01FemaleYesThurLunch26.37Abigail Parks3586645396220590Thur25440590$$Ok
20213.002.00FemaleYesThurLunch26.50Ashley Shaw180088043008041Thur13018041$$Ok
20316.402.50FemaleYesThurLunch28.20Toni Brooks3582289985920239Thur77700239$$Ok
20516.473.23FemaleYesThurLunch35.49Carly Reyes4787787236486Thur80846486$$Ok
20912.762.23FemaleYesSatDinner26.38Sarah Cunningham341876516331163Sat12741163$$Ok
21313.272.50FemaleYesSatDinner26.64Robin Andersen580140531089Sat13741089$$Ok
21512.901.10FemaleYesSatDinner26.45Jessica Owen4726904879471Sat69839471$$Ok
21711.591.50MaleYesSatDinner25.80Gary Orr30324521283406Sat84893406$$Ok
22012.162.20MaleYesFriLunch26.08Ricky Johnson213109508670736Fri46070736$$Ok
22113.423.48FemaleYesFriLunch26.71Leslie Kaufman379437981958785Fri75118785$$Generous
22315.983.00FemaleNoFriLunch35.33Mary Rivera5343428579353069Fri60143069$$Ok
22413.421.58MaleYesFriLunch26.71Ronald Vaughn DVM341503466406403Fri59596403$$Ok
22516.272.50FemaleYesFriLunch28.14Whitney Arnold3579111947217428Fri66657428$$Ok
22610.092.00FemaleYesFriLunch25.04Ruth Weiss5268689490381635Fri63591635$$Ok
22813.282.72MaleNoSatDinner26.64Glenn Jones502061651712Sat29371712$$Ok
23115.693.00MaleYesSatDinner35.23Jason Parks4812333796161Sat63346161$$Ok
23211.613.39MaleNoSatDinner25.80James Taylor6011482917327995Sat21247995$$Generous
23310.771.47MaleNoSatDinner25.38Paul Novak6011698897610858Sat14670858$$Ok
23415.533.00MaleYesSatDinner27.76Tracy Douglas4097938155941930Sat72201930$$Ok
23510.071.25MaleNoSatDinner25.04Sean Gonzalez3534021246117605Sat46157605$$Ok
23612.601.00MaleYesSatDinner26.30Matthew Myers3543676378973965Sat50323965$$Ok
24217.821.75MaleNoSatDinner28.91Dennis Dixon4375220550950Sat170950$$Ok
24318.783.00FemaleNoThurDinner29.39Michelle Hardin3511451626698139Thur6728139$$Ok
\n", "

130 rows × 14 columns

\n", "
" ], "text/plain": [ " total_bill tip sex smoker day time size price_per_person \\\n", "0 16.99 1.01 Female No Sun Dinner 2 8.49 \n", "1 10.34 1.66 Male No Sun Dinner 3 3.45 \n", "8 15.04 1.96 Male No Sun Dinner 2 7.52 \n", "9 14.78 3.23 Male No Sun Dinner 2 7.39 \n", "10 10.27 1.71 Male No Sun Dinner 2 5.14 \n", "12 15.42 1.57 Male No Sun Dinner 2 7.71 \n", "13 18.43 3.00 Male No Sun Dinner 4 4.61 \n", "14 14.83 3.02 Female No Sun Dinner 2 7.42 \n", "16 10.33 1.67 Female No Sun Dinner 3 3.44 \n", "17 16.29 3.71 Male No Sun Dinner 3 5.43 \n", "18 16.97 3.50 Female No Sun Dinner 3 5.66 \n", "20 17.92 4.08 Male No Sat Dinner 2 8.96 \n", "22 15.77 2.23 Female No Sat Dinner 2 7.88 \n", "24 19.82 3.18 Male No Sat Dinner 2 9.91 \n", "25 17.81 2.34 Male No Sat Dinner 4 4.45 \n", "26 13.37 2.00 Male No Sat Dinner 2 6.68 \n", "27 12.69 2.00 Male No Sat Dinner 2 6.34 \n", "29 19.65 3.00 Female No Sat Dinner 2 9.82 \n", "31 18.35 2.50 Male No Sat Dinner 4 4.59 \n", "32 15.06 3.00 Female No Sat Dinner 2 7.53 \n", "34 17.78 3.27 Male No Sat Dinner 2 8.89 \n", "36 16.31 2.00 Male No Sat Dinner 3 5.44 \n", "37 16.93 3.07 Female No Sat Dinner 3 5.64 \n", "38 18.69 2.31 Male No Sat Dinner 3 6.23 \n", "40 16.04 2.24 Male No Sat Dinner 3 5.35 \n", "41 17.46 2.54 Male No Sun Dinner 2 8.73 \n", "42 13.94 3.06 Male No Sun Dinner 2 6.97 \n", "45 18.29 3.00 Male No Sun Dinner 2 9.14 \n", "49 18.04 3.00 Male No Sun Dinner 2 9.02 \n", "50 12.54 2.50 Male No Sun Dinner 2 6.27 \n", ".. ... ... ... ... ... ... ... ... \n", "191 19.81 4.19 Female Yes Thur Lunch 2 9.90 \n", "193 15.48 2.02 Male Yes Thur Lunch 2 7.74 \n", "194 16.58 4.00 Male Yes Thur Lunch 2 8.29 \n", "196 10.34 2.00 Male Yes Thur Lunch 2 5.17 \n", "198 13.00 2.00 Female Yes Thur Lunch 2 6.50 \n", "199 13.51 2.00 Male Yes Thur Lunch 2 6.76 \n", "200 18.71 4.00 Male Yes Thur Lunch 3 6.24 \n", "201 12.74 2.01 Female Yes Thur Lunch 2 6.37 \n", "202 13.00 2.00 Female Yes Thur Lunch 2 6.50 \n", "203 16.40 2.50 Female Yes Thur Lunch 2 8.20 \n", "205 16.47 3.23 Female Yes Thur Lunch 3 5.49 \n", "209 12.76 2.23 Female Yes Sat Dinner 2 6.38 \n", "213 13.27 2.50 Female Yes Sat Dinner 2 6.64 \n", "215 12.90 1.10 Female Yes Sat Dinner 2 6.45 \n", "217 11.59 1.50 Male Yes Sat Dinner 2 5.80 \n", "220 12.16 2.20 Male Yes Fri Lunch 2 6.08 \n", "221 13.42 3.48 Female Yes Fri Lunch 2 6.71 \n", "223 15.98 3.00 Female No Fri Lunch 3 5.33 \n", "224 13.42 1.58 Male Yes Fri Lunch 2 6.71 \n", "225 16.27 2.50 Female Yes Fri Lunch 2 8.14 \n", "226 10.09 2.00 Female Yes Fri Lunch 2 5.04 \n", "228 13.28 2.72 Male No Sat Dinner 2 6.64 \n", "231 15.69 3.00 Male Yes Sat Dinner 3 5.23 \n", "232 11.61 3.39 Male No Sat Dinner 2 5.80 \n", "233 10.77 1.47 Male No Sat Dinner 2 5.38 \n", "234 15.53 3.00 Male Yes Sat Dinner 2 7.76 \n", "235 10.07 1.25 Male No Sat Dinner 2 5.04 \n", "236 12.60 1.00 Male Yes Sat Dinner 2 6.30 \n", "242 17.82 1.75 Male No Sat Dinner 2 8.91 \n", "243 18.78 3.00 Female No Thur Dinner 2 9.39 \n", "\n", " Payer Name CC Number Payment ID last_four Expensive \\\n", "0 Christy Cunningham 3560325168603410 Sun2959 3410 $$ \n", "1 Douglas Tucker 4478071379779230 Sun4608 9230 $$ \n", "8 Joseph Mcdonald 3522866365840377 Sun6820 0377 $$ \n", "9 Jerome Abbott 3532124519049786 Sun3775 9786 $$ \n", "10 William Riley 566287581219 Sun2546 1219 $$ \n", "12 Chad Harrington 577040572932 Sun1300 2932 $$ \n", "13 Joshua Jones 6011163105616890 Sun2971 6890 $$ \n", "14 Vanessa Jones 30016702287574 Sun3848 7574 $$ \n", "16 Elizabeth Foster 4240025044626033 Sun9715 6033 $$ \n", "17 John Pittman 6521340257218708 Sun2998 8708 $$ \n", "18 Laura Martinez 30422275171379 Sun2789 1379 $$ \n", "20 Thomas Rice 4403296224639756 Sat1709 9756 $$ \n", "22 Ashley Shelton 3524119516293213 Sat9786 3213 $$ \n", "24 Christopher Ross 36739148167928 Sat6236 7928 $$ \n", "25 Robert Perkins 30502930499388 Sat907 9388 $$ \n", "26 Kyle Avery 6531339539615499 Sat6651 5499 $$ \n", "27 Patrick Barber 30155551880343 Sat394 0343 $$ \n", "29 Melinda Murphy 5489272944576051 Sat2467 6051 $$ \n", "31 Danny Santiago 630415546013 Sat4947 6013 $$ \n", "32 Amanda Wilson 213186304291560 Sat1327 1560 $$ \n", "34 Jacob Castillo 3551492000704805 Sat8124 4805 $$ \n", "36 William Ford 3527691170179398 Sat9139 9398 $$ \n", "37 Erin Lewis 5161695527390786 Sat6406 0786 $$ \n", "38 Brandon Bradley 4427601595688633 Sat4056 8633 $$ \n", "40 Adam Edwards 3544447755679420 Sat8549 9420 $$ \n", "41 David Boyer 3536678244278149 Sun9460 8149 $$ \n", "42 Bryan Brown 36231182760859 Sun1699 0859 $$ \n", "45 Richard Fitzgerald 375156610762053 Sun8643 2053 $$ \n", "49 William Roth 6573923967142503 Sun9774 2503 $$ \n", "50 Jeremiah Neal 2225400829691416 Sun2021 1416 $$ \n", ".. ... ... ... ... ... \n", "191 Kristy Boyd 4317015327600068 Thur967 0068 $$ \n", "193 Raymond Sullivan 180068856139315 Thur606 9315 $$ \n", "194 Benjamin Weber 676210011505 Thur9318 1505 $$ \n", "196 Eric Martin 30442491190342 Thur9862 0342 $$ \n", "198 Katherine Bond 4926725945192 Thur437 5192 $$ \n", "199 Joseph Murphy MD 6547218923471275 Thur2428 1275 $$ \n", "200 Jason Conrad 4581233003487 Thur6048 3487 $$ \n", "201 Abigail Parks 3586645396220590 Thur2544 0590 $$ \n", "202 Ashley Shaw 180088043008041 Thur1301 8041 $$ \n", "203 Toni Brooks 3582289985920239 Thur7770 0239 $$ \n", "205 Carly Reyes 4787787236486 Thur8084 6486 $$ \n", "209 Sarah Cunningham 341876516331163 Sat1274 1163 $$ \n", "213 Robin Andersen 580140531089 Sat1374 1089 $$ \n", "215 Jessica Owen 4726904879471 Sat6983 9471 $$ \n", "217 Gary Orr 30324521283406 Sat8489 3406 $$ \n", "220 Ricky Johnson 213109508670736 Fri4607 0736 $$ \n", "221 Leslie Kaufman 379437981958785 Fri7511 8785 $$ \n", "223 Mary Rivera 5343428579353069 Fri6014 3069 $$ \n", "224 Ronald Vaughn DVM 341503466406403 Fri5959 6403 $$ \n", "225 Whitney Arnold 3579111947217428 Fri6665 7428 $$ \n", "226 Ruth Weiss 5268689490381635 Fri6359 1635 $$ \n", "228 Glenn Jones 502061651712 Sat2937 1712 $$ \n", "231 Jason Parks 4812333796161 Sat6334 6161 $$ \n", "232 James Taylor 6011482917327995 Sat2124 7995 $$ \n", "233 Paul Novak 6011698897610858 Sat1467 0858 $$ \n", "234 Tracy Douglas 4097938155941930 Sat7220 1930 $$ \n", "235 Sean Gonzalez 3534021246117605 Sat4615 7605 $$ \n", "236 Matthew Myers 3543676378973965 Sat5032 3965 $$ \n", "242 Dennis Dixon 4375220550950 Sat17 0950 $$ \n", "243 Michelle Hardin 3511451626698139 Thur672 8139 $$ \n", "\n", " Tip Quality \n", "0 Ok \n", "1 Ok \n", "8 Ok \n", "9 Ok \n", "10 Ok \n", "12 Ok \n", "13 Ok \n", "14 Ok \n", "16 Ok \n", "17 Ok \n", "18 Ok \n", "20 Ok \n", "22 Ok \n", "24 Ok \n", "25 Ok \n", "26 Ok \n", "27 Ok \n", "29 Ok \n", "31 Ok \n", "32 Ok \n", "34 Ok \n", "36 Ok \n", "37 Ok \n", "38 Ok \n", "40 Ok \n", "41 Ok \n", "42 Ok \n", "45 Ok \n", "49 Ok \n", "50 Ok \n", ".. ... \n", "191 Ok \n", "193 Ok \n", "194 Ok \n", "196 Ok \n", "198 Ok \n", "199 Ok \n", "200 Ok \n", "201 Ok \n", "202 Ok \n", "203 Ok \n", "205 Ok \n", "209 Ok \n", "213 Ok \n", "215 Ok \n", "217 Ok \n", "220 Ok \n", "221 Generous \n", "223 Ok \n", "224 Ok \n", "225 Ok \n", "226 Ok \n", "228 Ok \n", "231 Ok \n", "232 Generous \n", "233 Ok \n", "234 Ok \n", "235 Ok \n", "236 Ok \n", "242 Ok \n", "243 Ok \n", "\n", "[130 rows x 14 columns]" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['total_bill'].between(10,20,inclusive=True)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## sample" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
total_billtipsexsmokerdaytimesizeprice_per_personPayer NameCC NumberPayment IDlast_fourExpensiveTip Quality
21628.153.00MaleYesSatDinner55.63Shawn Barnett PhD4590982568244Sat73208244$$Ok
13610.332.00FemaleNoThurLunch25.16Donna Kelly180048553626376Thur13936376$$Ok
1318.433.00MaleNoSunDinner44.61Joshua Jones6011163105616890Sun29716890$$Ok
14618.641.36FemaleNoThurLunch36.21Kelly Estrada60463302327Thur39412327$$Ok
5638.013.00MaleYesSatDinner49.50James Christensen DDS349793629453226Sat89033226$$$Ok
\n", "
" ], "text/plain": [ " total_bill tip sex smoker day time size price_per_person \\\n", "216 28.15 3.00 Male Yes Sat Dinner 5 5.63 \n", "136 10.33 2.00 Female No Thur Lunch 2 5.16 \n", "13 18.43 3.00 Male No Sun Dinner 4 4.61 \n", "146 18.64 1.36 Female No Thur Lunch 3 6.21 \n", "56 38.01 3.00 Male Yes Sat Dinner 4 9.50 \n", "\n", " Payer Name CC Number Payment ID last_four Expensive \\\n", "216 Shawn Barnett PhD 4590982568244 Sat7320 8244 $$ \n", "136 Donna Kelly 180048553626376 Thur1393 6376 $$ \n", "13 Joshua Jones 6011163105616890 Sun2971 6890 $$ \n", "146 Kelly Estrada 60463302327 Thur3941 2327 $$ \n", "56 James Christensen DDS 349793629453226 Sat8903 3226 $$$ \n", "\n", " Tip Quality \n", "216 Ok \n", "136 Ok \n", "13 Ok \n", "146 Ok \n", "56 Ok " ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sample(5)" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
total_billtipsexsmokerdaytimesizeprice_per_personPayer NameCC NumberPayment IDlast_fourExpensiveTip Quality
7325.285.00FemaleYesSatDinner212.64Julie Holmes5418689346409571Sat60659571$$Ok
14134.306.70MaleNoThurLunch65.72Steven Carlson3526515703718508Thur10258508$$$Ok
23929.035.92MaleNoSatDinner39.68Michael Avila5296068606052842Sat26572842$$Ok
23732.831.17MaleYesSatDinner216.42Thomas Brown4284722681265508Sat29295508$$$Ok
6915.012.09MaleYesSatDinner27.50Adam Hall4700924377057571Sat8557571$$Ok
10818.243.76MaleNoSatDinner29.12Steven Grant4112810433473856Sat63763856$$Ok
8534.835.17FemaleNoThurLunch48.71Shawna Cook6011787464177340Thur79727340$$$Ok
15648.175.00MaleNoSunDinner68.03Ryan Gonzales3523151482063321Sun75183321$$$Ok
19610.342.00MaleYesThurLunch25.17Eric Martin30442491190342Thur98620342$$Ok
4117.462.54MaleNoSunDinner28.73David Boyer3536678244278149Sun94608149$$Ok
23612.601.00MaleYesSatDinner26.30Matthew Myers3543676378973965Sat50323965$$Ok
22516.272.50FemaleYesFriLunch28.14Whitney Arnold3579111947217428Fri66657428$$Ok
6113.812.00MaleYesSatDinner26.90Ryan Hernandez4766834726806Sat30306806$$Ok
20316.402.50FemaleYesThurLunch28.20Toni Brooks3582289985920239Thur77700239$$Ok
525.294.71MaleNoSunDinner46.32Erik Smith213140353657882Sun96797882$$Ok
22012.162.20MaleYesFriLunch26.08Ricky Johnson213109508670736Fri46070736$$Ok
11924.082.92FemaleNoThurLunch46.02Melanie Jordan676212062720Thur80632720$$Ok
9627.284.00MaleYesFriDinner213.64Eric Carter4563054452787961Fri31597961$$Ok
15916.492.00MaleNoSunDinner44.12Christopher Soto30501814271434Sun17811434$$Ok
2613.372.00MaleNoSatDinner26.68Kyle Avery6531339539615499Sat66515499$$Ok
12922.822.18MaleNoThurLunch37.61Raymond Torres4855776744024Thur94244024$$Ok
2120.292.75FemaleNoSatDinner210.14Natalie Gardner5448125351489749Sat96189749$$Ok
9422.753.25FemaleNoFriDinner211.38Jamie Garza676318332068Fri23182068$$Ok
3931.275.00MaleNoSatDinner310.42Mr. Brandon Berry6011525851069856Sat63739856$$$Ok
\n", "
" ], "text/plain": [ " total_bill tip sex smoker day time size price_per_person \\\n", "73 25.28 5.00 Female Yes Sat Dinner 2 12.64 \n", "141 34.30 6.70 Male No Thur Lunch 6 5.72 \n", "239 29.03 5.92 Male No Sat Dinner 3 9.68 \n", "237 32.83 1.17 Male Yes Sat Dinner 2 16.42 \n", "69 15.01 2.09 Male Yes Sat Dinner 2 7.50 \n", "108 18.24 3.76 Male No Sat Dinner 2 9.12 \n", "85 34.83 5.17 Female No Thur Lunch 4 8.71 \n", "156 48.17 5.00 Male No Sun Dinner 6 8.03 \n", "196 10.34 2.00 Male Yes Thur Lunch 2 5.17 \n", "41 17.46 2.54 Male No Sun Dinner 2 8.73 \n", "236 12.60 1.00 Male Yes Sat Dinner 2 6.30 \n", "225 16.27 2.50 Female Yes Fri Lunch 2 8.14 \n", "61 13.81 2.00 Male Yes Sat Dinner 2 6.90 \n", "203 16.40 2.50 Female Yes Thur Lunch 2 8.20 \n", "5 25.29 4.71 Male No Sun Dinner 4 6.32 \n", "220 12.16 2.20 Male Yes Fri Lunch 2 6.08 \n", "119 24.08 2.92 Female No Thur Lunch 4 6.02 \n", "96 27.28 4.00 Male Yes Fri Dinner 2 13.64 \n", "159 16.49 2.00 Male No Sun Dinner 4 4.12 \n", "26 13.37 2.00 Male No Sat Dinner 2 6.68 \n", "129 22.82 2.18 Male No Thur Lunch 3 7.61 \n", "21 20.29 2.75 Female No Sat Dinner 2 10.14 \n", "94 22.75 3.25 Female No Fri Dinner 2 11.38 \n", "39 31.27 5.00 Male No Sat Dinner 3 10.42 \n", "\n", " Payer Name CC Number Payment ID last_four Expensive \\\n", "73 Julie Holmes 5418689346409571 Sat6065 9571 $$ \n", "141 Steven Carlson 3526515703718508 Thur1025 8508 $$$ \n", "239 Michael Avila 5296068606052842 Sat2657 2842 $$ \n", "237 Thomas Brown 4284722681265508 Sat2929 5508 $$$ \n", "69 Adam Hall 4700924377057571 Sat855 7571 $$ \n", "108 Steven Grant 4112810433473856 Sat6376 3856 $$ \n", "85 Shawna Cook 6011787464177340 Thur7972 7340 $$$ \n", "156 Ryan Gonzales 3523151482063321 Sun7518 3321 $$$ \n", "196 Eric Martin 30442491190342 Thur9862 0342 $$ \n", "41 David Boyer 3536678244278149 Sun9460 8149 $$ \n", "236 Matthew Myers 3543676378973965 Sat5032 3965 $$ \n", "225 Whitney Arnold 3579111947217428 Fri6665 7428 $$ \n", "61 Ryan Hernandez 4766834726806 Sat3030 6806 $$ \n", "203 Toni Brooks 3582289985920239 Thur7770 0239 $$ \n", "5 Erik Smith 213140353657882 Sun9679 7882 $$ \n", "220 Ricky Johnson 213109508670736 Fri4607 0736 $$ \n", "119 Melanie Jordan 676212062720 Thur8063 2720 $$ \n", "96 Eric Carter 4563054452787961 Fri3159 7961 $$ \n", "159 Christopher Soto 30501814271434 Sun1781 1434 $$ \n", "26 Kyle Avery 6531339539615499 Sat6651 5499 $$ \n", "129 Raymond Torres 4855776744024 Thur9424 4024 $$ \n", "21 Natalie Gardner 5448125351489749 Sat9618 9749 $$ \n", "94 Jamie Garza 676318332068 Fri2318 2068 $$ \n", "39 Mr. Brandon Berry 6011525851069856 Sat6373 9856 $$$ \n", "\n", " Tip Quality \n", "73 Ok \n", "141 Ok \n", "239 Ok \n", "237 Ok \n", "69 Ok \n", "108 Ok \n", "85 Ok \n", "156 Ok \n", "196 Ok \n", "41 Ok \n", "236 Ok \n", "225 Ok \n", "61 Ok \n", "203 Ok \n", "5 Ok \n", "220 Ok \n", "119 Ok \n", "96 Ok \n", "159 Ok \n", "26 Ok \n", "129 Ok \n", "21 Ok \n", "94 Ok \n", "39 Ok " ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sample(frac=0.1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## nlargest and nsmallest" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
total_billtipsexsmokerdaytimesizeprice_per_personPayer NameCC NumberPayment IDlast_fourExpensiveTip Quality
17050.8110.00MaleYesSatDinner316.94Gregory Clark5473850968388236Sat19548236$$$Ok
21248.339.00MaleNoSatDinner412.08Alex Williamson676218815212Sat45905212$$$Ok
2339.427.58MaleNoSatDinner49.86Lance Peterson3542584061609808Sat2399808$$$Ok
5948.276.73MaleNoSatDinner412.07Brian Ortiz6596453823950595Sat81390595$$$Ok
14134.306.70MaleNoThurLunch65.72Steven Carlson3526515703718508Thur10258508$$$Ok
18323.176.50MaleYesSunDinner45.79Dr. Michael James4718501859162Sun60599162$$Generous
21428.176.50FemaleYesSatDinner39.39Marissa Jackson4922302538691962Sat33741962$$Ok
4732.406.00MaleNoSunDinner48.10James Barnes3552002592874186Sun96774186$$$Ok
23929.035.92MaleNoSatDinner39.68Michael Avila5296068606052842Sat26572842$$Ok
8824.715.85MaleNoThurLunch212.36Roger Taylor4410248629955Thur90039955$$Ok
\n", "
" ], "text/plain": [ " total_bill tip sex smoker day time size price_per_person \\\n", "170 50.81 10.00 Male Yes Sat Dinner 3 16.94 \n", "212 48.33 9.00 Male No Sat Dinner 4 12.08 \n", "23 39.42 7.58 Male No Sat Dinner 4 9.86 \n", "59 48.27 6.73 Male No Sat Dinner 4 12.07 \n", "141 34.30 6.70 Male No Thur Lunch 6 5.72 \n", "183 23.17 6.50 Male Yes Sun Dinner 4 5.79 \n", "214 28.17 6.50 Female Yes Sat Dinner 3 9.39 \n", "47 32.40 6.00 Male No Sun Dinner 4 8.10 \n", "239 29.03 5.92 Male No Sat Dinner 3 9.68 \n", "88 24.71 5.85 Male No Thur Lunch 2 12.36 \n", "\n", " Payer Name CC Number Payment ID last_four Expensive \\\n", "170 Gregory Clark 5473850968388236 Sat1954 8236 $$$ \n", "212 Alex Williamson 676218815212 Sat4590 5212 $$$ \n", "23 Lance Peterson 3542584061609808 Sat239 9808 $$$ \n", "59 Brian Ortiz 6596453823950595 Sat8139 0595 $$$ \n", "141 Steven Carlson 3526515703718508 Thur1025 8508 $$$ \n", "183 Dr. Michael James 4718501859162 Sun6059 9162 $$ \n", "214 Marissa Jackson 4922302538691962 Sat3374 1962 $$ \n", "47 James Barnes 3552002592874186 Sun9677 4186 $$$ \n", "239 Michael Avila 5296068606052842 Sat2657 2842 $$ \n", "88 Roger Taylor 4410248629955 Thur9003 9955 $$ \n", "\n", " Tip Quality \n", "170 Ok \n", "212 Ok \n", "23 Ok \n", "59 Ok \n", "141 Ok \n", "183 Generous \n", "214 Ok \n", "47 Ok \n", "239 Ok \n", "88 Ok " ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.nlargest(10,'tip')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" } }, "nbformat": 4, "nbformat_minor": 1 }