{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "\n", "\n", "___\n", "
Copyright by Pierian Data Inc.
\n", "
For more information, visit us at www.pieriandata.com
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Pandas Project Exercise - Solutions\n", "\n", "# The Data\n", "\n", "This data set contains booking information for a city hotel and a resort hotel, and includes information such as when the booking was made, length of stay, the number of adults, children, and/or babies, and the number of available parking spaces, among other things.\n", "\n", "All personally identifying information has been removed from the data.\n", "\n", "Acknowledgements\n", "The data is originally from the article Hotel Booking Demand Datasets, written by Nuno Antonio, Ana Almeida, and Luis Nunes for Data in Brief, Volume 22, February 2019.\n", "\n", "\n", "----------------------------\n", "\n", "#### NOTE: Names, Emails, Phone Numbers, and Credit Card numbers in the data are synthetic and not real information from people. The hotel data is real from the publication listed above.\n", "\n", "##
Data Column Reference
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
VariableTypeDescriptionSource/Engineering
ADRNumericAverage Daily Rate as defined by [5]BO, BL and TR / Calculated by dividing the sum of all lodging transactions by the total number of staying nights
AdultsIntegerNumber of adultsBO and BL
AgentCategoricalID of the travel agency that made the bookingaBO and BL
ArrivalDateDayOfMonthIntegerDay of the month of the arrival dateBO and BL
ArrivalDateMonthCategoricalMonth of arrival date with 12 categories: “January” to “December”BO and BL
ArrivalDateWeekNumberIntegerWeek number of the arrival dateBO and BL
ArrivalDateYearIntegerYear of arrival dateBO and BL
AssignedRoomTypeCategoricalCode for the type of room assigned to the booking. Sometimes the assigned room type differs from the reserved room type due to hotel operation reasons (e.g. overbooking) or by customer request. Code is presented instead of designation for anonymity reasonsBO and BL
BabiesIntegerNumber of babiesBO and BL
BookingChangesIntegerNumber of changes/amendments made to the booking from the moment the booking was entered on the PMS until the moment of check-in or cancellationBO and BL/Calculated by adding the number of unique iterations that change some of the booking attributes, namely: persons, arrival date, nights, reserved room type or meal
ChildrenIntegerNumber of childrenBO and BL/Sum of both payable and non-payable children
CompanyCategoricalID of the company/entity that made the booking or responsible for paying the booking. ID is presented instead of designation for anonymity reasonsBO and BL.
CountryCategoricalCountry of origin. Categories are represented in the ISO 3155–3:2013 format [6]BO, BL and NT




CustomerTypeCategoricalType of booking, assuming one of four categories:BO and BL
Contract - when the booking has an allotment or other type of contract associated to it;
Group – when the booking is associated to a group;
Transient – when the booking is not part of a group or contract, and is not associated to other transient booking;
Transient-party – when the booking is transient, but is associated to at least other transient booking
DaysInWaitingListIntegerNumber of days the booking was in the waiting list before it was confirmed to the customerBO/Calculated by subtracting the date the booking was confirmed to the customer from the date the booking entered on the PMS




DepositTypeCategoricalIndication on if the customer made a deposit to guarantee the booking. This variable can assume three categories:BO and TR/Value calculated based on the payments identified for the booking in the transaction (TR) table before the booking׳s arrival or cancellation date.
No Deposit – no deposit was made;
In case no payments were found the value is “No Deposit”.
If the payment was equal or exceeded the total cost of stay, the value is set as “Non Refund”.
Non Refund – a deposit was made in the value of the total stay cost;
Otherwise the value is set as “Refundable”
Refundable – a deposit was made with a value under the total cost of stay.
DistributionChannelCategoricalBooking distribution channel. The term “TA” means “Travel Agents” and “TO” means “Tour Operators”BO, BL and DC
IsCanceledCategoricalValue indicating if the booking was canceled (1) or not (0)BO
IsRepeatedGuestCategoricalValue indicating if the booking name was from a repeated guest (1) or not (0)BO, BL and C/ Variable created by verifying if a profile was associated with the booking customer. If so, and if the customer profile creation date was prior to the creation date for the booking on the PMS database it was assumed the booking was from a repeated guest
LeadTimeIntegerNumber of days that elapsed between the entering date of the booking into the PMS and the arrival dateBO and BL/ Subtraction of the entering date from the arrival date
MarketSegmentCategoricalMarket segment designation. In categories, the term “TA” means “Travel Agents” and “TO” means “Tour Operators”BO, BL and MS




MealCategoricalType of meal booked. Categories are presented in standard hospitality meal packages:BO, BL and ML
Undefined/SC – no meal package;
BB – Bed & Breakfast;
HB – Half board (breakfast and one other meal – usually dinner);
FB – Full board (breakfast, lunch and dinner)
PreviousBookingsNotCanceledIntegerNumber of previous bookings not cancelled by the customer prior to the current bookingBO and BL / In case there was no customer profile associated with the booking, the value is set to 0. Otherwise, the value is the number of bookings with the same customer profile created before the current booking and not canceled.
PreviousCancellationsIntegerNumber of previous bookings that were cancelled by the customer prior to the current bookingBO and BL/ In case there was no customer profile associated with the booking, the value is set to 0. Otherwise, the value is the number of bookings with the same customer profile created before the current booking and canceled.
RequiredCardParkingSpacesIntegerNumber of car parking spaces required by the customerBO and BL




ReservationStatusCategoricalReservation last status, assuming one of three categories:BO
Canceled – booking was canceled by the customer;
Check-Out – customer has checked in but already departed;
No-Show – customer did not check-in and did inform the hotel of the reason why
ReservationStatusDateDateDate at which the last status was set. This variable can be used in conjunction with the ReservationStatus to understand when was the booking canceled or when did the customer checked-out of the hotelBO
ReservedRoomTypeCategoricalCode of room type reserved. Code is presented instead of designation for anonymity reasonsBO and BL
StaysInWeekendNightsIntegerNumber of weekend nights (Saturday or Sunday) the guest stayed or booked to stay at the hotelBO and BL/ Calculated by counting the number of weekend nights from the total number of nights
StaysInWeekNightsIntegerNumber of week nights (Monday to Friday) the guest stayed or booked to stay at the hotelBO and BL/Calculated by counting the number of week nights from the total number of nights
TotalOfSpecialRequestsIntegerNumber of special requests made by the customer (e.g. twin bed or high floor)BO and BL/Sum of all special requests
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "-----------" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# TASKS\n", "\n", "** Complete the tasks shown in bold below. The expected output is shown in a cell below. Be careful not to run the cell above the expected output, as it will clear the expected output. Try your best to solve these in one line of pandas code (not every single question can be solved in one line, but many can be!) Refer to solutions notebook and video to view possible solutions. NOTE: Many tasks have multiple correct solution methods!**" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "hotels = pd.read_csv(\"hotel_booking_data.csv\")" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
hotelis_canceledlead_timearrival_date_yeararrival_date_montharrival_date_week_numberarrival_date_day_of_monthstays_in_weekend_nightsstays_in_week_nightsadults...customer_typeadrrequired_car_parking_spacestotal_of_special_requestsreservation_statusreservation_status_datenameemailphone-numbercredit_card
0Resort Hotel03422015July271002...Transient0.000Check-Out2015-07-01Ernest BarnesErnest.Barnes31@outlook.com669-792-1661************4322
1Resort Hotel07372015July271002...Transient0.000Check-Out2015-07-01Andrea BakerAndrea_Baker94@aol.com858-637-6955************9157
2Resort Hotel072015July271011...Transient75.000Check-Out2015-07-02Rebecca ParkerRebecca_Parker@comcast.net652-885-2745************3734
3Resort Hotel0132015July271011...Transient75.000Check-Out2015-07-02Laura MurrayLaura_M@gmail.com364-656-8427************5677
4Resort Hotel0142015July271022...Transient98.001Check-Out2015-07-03Linda HinesLHines@verizon.com713-226-5883************5498
\n", "

5 rows × 36 columns

\n", "
" ], "text/plain": [ " hotel is_canceled lead_time arrival_date_year arrival_date_month \\\n", "0 Resort Hotel 0 342 2015 July \n", "1 Resort Hotel 0 737 2015 July \n", "2 Resort Hotel 0 7 2015 July \n", "3 Resort Hotel 0 13 2015 July \n", "4 Resort Hotel 0 14 2015 July \n", "\n", " arrival_date_week_number arrival_date_day_of_month \\\n", "0 27 1 \n", "1 27 1 \n", "2 27 1 \n", "3 27 1 \n", "4 27 1 \n", "\n", " stays_in_weekend_nights stays_in_week_nights adults ... customer_type \\\n", "0 0 0 2 ... Transient \n", "1 0 0 2 ... Transient \n", "2 0 1 1 ... Transient \n", "3 0 1 1 ... Transient \n", "4 0 2 2 ... Transient \n", "\n", " adr required_car_parking_spaces total_of_special_requests \\\n", "0 0.0 0 0 \n", "1 0.0 0 0 \n", "2 75.0 0 0 \n", "3 75.0 0 0 \n", "4 98.0 0 1 \n", "\n", " reservation_status reservation_status_date name \\\n", "0 Check-Out 2015-07-01 Ernest Barnes \n", "1 Check-Out 2015-07-01 Andrea Baker \n", "2 Check-Out 2015-07-02 Rebecca Parker \n", "3 Check-Out 2015-07-02 Laura Murray \n", "4 Check-Out 2015-07-03 Linda Hines \n", "\n", " email phone-number credit_card \n", "0 Ernest.Barnes31@outlook.com 669-792-1661 ************4322 \n", "1 Andrea_Baker94@aol.com 858-637-6955 ************9157 \n", "2 Rebecca_Parker@comcast.net 652-885-2745 ************3734 \n", "3 Laura_M@gmail.com 364-656-8427 ************5677 \n", "4 LHines@verizon.com 713-226-5883 ************5498 \n", "\n", "[5 rows x 36 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hotels.head()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 119390 entries, 0 to 119389\n", "Data columns (total 36 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 hotel 119390 non-null object \n", " 1 is_canceled 119390 non-null int64 \n", " 2 lead_time 119390 non-null int64 \n", " 3 arrival_date_year 119390 non-null int64 \n", " 4 arrival_date_month 119390 non-null object \n", " 5 arrival_date_week_number 119390 non-null int64 \n", " 6 arrival_date_day_of_month 119390 non-null int64 \n", " 7 stays_in_weekend_nights 119390 non-null int64 \n", " 8 stays_in_week_nights 119390 non-null int64 \n", " 9 adults 119390 non-null int64 \n", " 10 children 119386 non-null float64\n", " 11 babies 119390 non-null int64 \n", " 12 meal 119390 non-null object \n", " 13 country 118902 non-null object \n", " 14 market_segment 119390 non-null object \n", " 15 distribution_channel 119390 non-null object \n", " 16 is_repeated_guest 119390 non-null int64 \n", " 17 previous_cancellations 119390 non-null int64 \n", " 18 previous_bookings_not_canceled 119390 non-null int64 \n", " 19 reserved_room_type 119390 non-null object \n", " 20 assigned_room_type 119390 non-null object \n", " 21 booking_changes 119390 non-null int64 \n", " 22 deposit_type 119390 non-null object \n", " 23 agent 103050 non-null float64\n", " 24 company 6797 non-null float64\n", " 25 days_in_waiting_list 119390 non-null int64 \n", " 26 customer_type 119390 non-null object \n", " 27 adr 119390 non-null float64\n", " 28 required_car_parking_spaces 119390 non-null int64 \n", " 29 total_of_special_requests 119390 non-null int64 \n", " 30 reservation_status 119390 non-null object \n", " 31 reservation_status_date 119390 non-null object \n", " 32 name 119390 non-null object \n", " 33 email 119390 non-null object \n", " 34 phone-number 119390 non-null object \n", " 35 credit_card 119390 non-null object \n", "dtypes: float64(4), int64(16), object(16)\n", "memory usage: 32.8+ MB\n" ] } ], "source": [ "hotels.info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "**TASK: How many rows are there?**" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# CODE HERE" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "119390" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(hotels) #hotels.info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**TASK: Is there any missing data? If so, which column has the most missing data?**" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# CODE HERE" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "hotel 0\n", "is_canceled 0\n", "lead_time 0\n", "arrival_date_year 0\n", "arrival_date_month 0\n", "arrival_date_week_number 0\n", "arrival_date_day_of_month 0\n", "stays_in_weekend_nights 0\n", "stays_in_week_nights 0\n", "adults 0\n", "children 4\n", "babies 0\n", "meal 0\n", "country 488\n", "market_segment 0\n", "distribution_channel 0\n", "is_repeated_guest 0\n", "previous_cancellations 0\n", "previous_bookings_not_canceled 0\n", "reserved_room_type 0\n", "assigned_room_type 0\n", "booking_changes 0\n", "deposit_type 0\n", "agent 16340\n", "company 112593\n", "days_in_waiting_list 0\n", "customer_type 0\n", "adr 0\n", "required_car_parking_spaces 0\n", "total_of_special_requests 0\n", "reservation_status 0\n", "reservation_status_date 0\n", "name 0\n", "email 0\n", "phone-number 0\n", "credit_card 0\n", "dtype: int64" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# hotels.isna().sum() #works as well\n", "hotels.isnull().sum()" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Yes, missing data, company column missing: 112593 rows.\n" ] } ], "source": [ "print(f\"Yes, missing data, company column missing: {hotels['company'].isna().sum()} rows.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**TASK: Drop the \"company\" column from the dataset.**" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# CODE HERE" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": true }, "outputs": [], "source": [ "hotels = hotels.drop('company',axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**TASK: What are the top 5 most common country codes in the dataset?**" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# CODE HERE" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "PRT 48590\n", "GBR 12129\n", "FRA 10415\n", "ESP 8568\n", "DEU 7287\n", "Name: country, dtype: int64" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hotels['country'].value_counts()[:5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**TASK: What is the name of the person who paid the highest ADR (average daily rate)? How much was their ADR?**" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# CODE HERE" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "adr 5400\n", "name Daniel Walter\n", "Name: 48515, dtype: object" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hotels.sort_values('adr',ascending=False)[['adr','name']].iloc[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**TASK: The adr is the average daily rate for a person's stay at the hotel. What is the mean adr across all the hotel stays in the dataset?**" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# CODE HERE" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "101.83" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "round(hotels['adr'].mean(),2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**TASK: What is the average (mean) number of nights for a stay across the entire data set? Feel free to round this to 2 decimal points.**" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# CODE HERE" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3.43" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hotels['total_stay_days'] = hotels['stays_in_week_nights'] + hotels['stays_in_weekend_nights']\n", "\n", "round(hotels['total_stay_days'].mean(),2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**TASK: What is the average total cost for a stay in the dataset? Not *average daily cost*, but *total* stay cost. Feel free to round this to 2 decimal points.**" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# CODE HERE" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "357.85" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hotels['total_paid'] = hotels['adr'] * hotels['total_stay_days']\n", "round(hotels['total_paid'].mean(),2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**TASK: What are the names and emails of people who made 5 \"Special Requests\"?**" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# CODE HERE" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameemail
7860Amanda HarperAmanda.H66@yahoo.com
11125Laura SandersSanders_Laura@hotmail.com
14596Tommy OrtizTommy_O@hotmail.com
14921Gilbert MillerMiller.Gilbert@aol.com
14922Timothy TorresTTorres@protonmail.com
24630Jennifer WeaverJennifer_W@aol.com
27288Crystal HortonCrystal.H@mail.com
27477Brittney BurkeBurke_Brittney16@att.com
29906Cynthia CabreraCabrera.Cynthia@xfinity.com
29949Sarah FloydSarah_F@gmail.com
32267Michelle VillaMichelle.Villa@aol.com
39027Nichole HebertHebert.Nichole@gmail.com
39129Lindsey MckenzieLindsey.Mckenzie@att.com
39525Ashley EdwardsEdwards.Ashley@yahoo.com
70114Christopher TorresTorres.Christopher@gmail.com
78819Mrs. Tara Sullivan DVMMrs..DVM@xfinity.com
78820Michaela BrownMichaelaBrown@att.com
78822Kurt Maldonado MDKMD15@xfinity.com
97072Jason RichardsonJason.R@zoho.com
97099Terri HurleyTHurley@xfinity.com
97261Mrs. Caitlin WebbMrs._W@comcast.net
98410Holly ArroyoArroyo_Holly@mail.com
98674Denise CampbellDenise_C@gmail.com
99887Michael SmithMichael.S42@aol.com
99888Dr. Trevor SellersDr._S@aol.com
101569Kayla MurphyKayla.Murphy@yahoo.com
102061Taylor MartinezTaylor.Martinez@hotmail.com
109511Charles WilsonCharles_Wilson@yahoo.com
109590Tyler AllisonTyler.A@protonmail.com
110082Matthew BaileyMatthew_Bailey@aol.com
110083Charlotte AcevedoCharlotte_A@verizon.com
111909Darrell BrennanBrennan_Darrell51@hotmail.com
111911Melinda JensenMelindaJensen@zoho.com
113915Terry ArnoldArnold.Terry@zoho.com
114770Mary NguyenNguyen.Mary@protonmail.com
114909Lindsay CuevasLindsay.Cuevas40@mail.com
116455Cynthia HernandezCynthiaHernandez@xfinity.com
116457Angela HawkinsAngela_H@gmail.com
118817Sue LawsonSue.L52@comcast.net
119161Alyssa RichardsAlyssa_Richards@aol.com
\n", "
" ], "text/plain": [ " name email\n", "7860 Amanda Harper Amanda.H66@yahoo.com\n", "11125 Laura Sanders Sanders_Laura@hotmail.com\n", "14596 Tommy Ortiz Tommy_O@hotmail.com\n", "14921 Gilbert Miller Miller.Gilbert@aol.com\n", "14922 Timothy Torres TTorres@protonmail.com\n", "24630 Jennifer Weaver Jennifer_W@aol.com\n", "27288 Crystal Horton Crystal.H@mail.com\n", "27477 Brittney Burke Burke_Brittney16@att.com\n", "29906 Cynthia Cabrera Cabrera.Cynthia@xfinity.com\n", "29949 Sarah Floyd Sarah_F@gmail.com\n", "32267 Michelle Villa Michelle.Villa@aol.com\n", "39027 Nichole Hebert Hebert.Nichole@gmail.com\n", "39129 Lindsey Mckenzie Lindsey.Mckenzie@att.com\n", "39525 Ashley Edwards Edwards.Ashley@yahoo.com\n", "70114 Christopher Torres Torres.Christopher@gmail.com\n", "78819 Mrs. Tara Sullivan DVM Mrs..DVM@xfinity.com\n", "78820 Michaela Brown MichaelaBrown@att.com\n", "78822 Kurt Maldonado MD KMD15@xfinity.com\n", "97072 Jason Richardson Jason.R@zoho.com\n", "97099 Terri Hurley THurley@xfinity.com\n", "97261 Mrs. Caitlin Webb Mrs._W@comcast.net\n", "98410 Holly Arroyo Arroyo_Holly@mail.com\n", "98674 Denise Campbell Denise_C@gmail.com\n", "99887 Michael Smith Michael.S42@aol.com\n", "99888 Dr. Trevor Sellers Dr._S@aol.com\n", "101569 Kayla Murphy Kayla.Murphy@yahoo.com\n", "102061 Taylor Martinez Taylor.Martinez@hotmail.com\n", "109511 Charles Wilson Charles_Wilson@yahoo.com\n", "109590 Tyler Allison Tyler.A@protonmail.com\n", "110082 Matthew Bailey Matthew_Bailey@aol.com\n", "110083 Charlotte Acevedo Charlotte_A@verizon.com\n", "111909 Darrell Brennan Brennan_Darrell51@hotmail.com\n", "111911 Melinda Jensen MelindaJensen@zoho.com\n", "113915 Terry Arnold Arnold.Terry@zoho.com\n", "114770 Mary Nguyen Nguyen.Mary@protonmail.com\n", "114909 Lindsay Cuevas Lindsay.Cuevas40@mail.com\n", "116455 Cynthia Hernandez CynthiaHernandez@xfinity.com\n", "116457 Angela Hawkins Angela_H@gmail.com\n", "118817 Sue Lawson Sue.L52@comcast.net\n", "119161 Alyssa Richards Alyssa_Richards@aol.com" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hotels[hotels['total_of_special_requests'] == 5][['name','email']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**TASK: What percentage of hotel stays were classified as \"repeat guests\"? (Do not base this off the name of the person, but instead of the is_repeated_guest column)**" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#CODE HERE" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3.19" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# You can sum booleans, False gets treated as zero, True as one\n", "round(100 * sum(hotels['is_repeated_guest'] == 1) / len(hotels),2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**TASK: What are the top 5 most common last name in the dataset? Bonus: Can you figure this out in one line of pandas code? (For simplicity treat the a title such as MD as a last name, for example Caroline Conley MD can be said to have the last name MD)**" ] }, { "cell_type": "code", "execution_count": 69, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#CODE HERE" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Smith 2510\n", "Johnson 1998\n", "Williams 1628\n", "Jones 1441\n", "Brown 1433\n", "Name: name, dtype: int64" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hotels['name'].apply(lambda name: name.split()[1]).value_counts()[:5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**TASK: What are the names of the people who had booked the most number children and babies for their stay? (Don't worry if they canceled, only consider number of people reported at the time of their reservation)**" ] }, { "cell_type": "code", "execution_count": 78, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#CODE HERE" ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameadultstotal_kidsbabieschildren
328Jamie Ramirez210.0010.0
46619Nicholas Parker210.0100.0
78656Marc Robinson19.090.0
\n", "
" ], "text/plain": [ " name adults total_kids babies children\n", "328 Jamie Ramirez 2 10.0 0 10.0\n", "46619 Nicholas Parker 2 10.0 10 0.0\n", "78656 Marc Robinson 1 9.0 9 0.0" ] }, "execution_count": 81, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hotels['total_kids'] = hotels['babies'] + hotels['children']\n", "\n", "hotels.sort_values('total_kids',ascending=False)[['name','adults','total_kids','babies','children']][:3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**TASK: What are the top 3 most common area code in the phone numbers? (Area code is first 3 digits)**" ] }, { "cell_type": "code", "execution_count": 82, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#CODE HERE" ] }, { "cell_type": "code", "execution_count": 89, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Code - Total Count\n" ] }, { "data": { "text/plain": [ "799 168\n", "185 167\n", "541 166\n", "Name: phone-number, dtype: int64" ] }, "execution_count": 89, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print('Code - Total Count')\n", "hotels['phone-number'].apply(lambda num:num[:3]).value_counts()[:3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**TASK: How many arrivals took place between the 1st and the 15th of the month (inclusive of 1 and 15) ? Bonus: Can you do this in one line of pandas code?**" ] }, { "cell_type": "code", "execution_count": 90, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#CODE HERE" ] }, { "cell_type": "code", "execution_count": 95, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "58152" ] }, "execution_count": 95, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hotels['arrival_date_day_of_month'].apply(lambda day: day in range(1,16)).sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "**HARD BONUS TASK: Create a table for counts for each day of the week that people arrived. (E.g. 5000 arrivals were on a Monday, 3000 were on a Tuesday, etc..)**" ] }, { "cell_type": "code", "execution_count": 105, "metadata": {}, "outputs": [], "source": [ "# CODE HERE" ] }, { "cell_type": "code", "execution_count": 106, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": 107, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def convert(day,month,year):\n", " return f'{day}-{month}-{year}'" ] }, { "cell_type": "code", "execution_count": 109, "metadata": {}, "outputs": [], "source": [ "hotels['date'] = np.vectorize(convert)(hotels['arrival_date_day_of_month'],\n", " hotels['arrival_date_month'],\n", " hotels['arrival_date_year'])" ] }, { "cell_type": "code", "execution_count": 112, "metadata": {}, "outputs": [], "source": [ "hotels['date'] = pd.to_datetime(hotels['date'])" ] }, { "cell_type": "code", "execution_count": 115, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Friday 19631\n", "Thursday 19254\n", "Monday 18171\n", "Saturday 18055\n", "Wednesday 16139\n", "Sunday 14141\n", "Tuesday 13999\n", "Name: date, dtype: int64" ] }, "execution_count": 115, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.day_name.html#pandas.Series.dt.day_name\n", "hotels['date'].dt.day_name().value_counts()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 1 }