You will work with a restaurant dataset containing the customer ID, the time when the customer came
Question:
You will work with a restaurant dataset containing the customer ID, the time when the customer came (either for lunch or for dinner) and the cost of the first, second and third course. Every course consists of a meal and some drinks. In this case the food has a fixed cost (see the menu below) while the drinks are a random number that is added (let's just say the restaurant has a very wide range of drinks).
The restaurant has different types of clients. The business clients are more likely to have the more expensive dishes. The restaurant is located next to a fitness center and attracts some of those healthier folks ; they usually go for soups or salads and hardly ever take desserts. Across the street there is a retirement center ; their residents usually take a three-course menu, often ending with a nice piece of pie. The other customers are one-time customers who are passing by for a quick main dish.
The end goal is to look in the data for signs of these customers. Can we find which ones are likely which type? Once we have this we can determine their likelihood of visiting, what dishes they usually order, etc. The restaurant asked us to use this data to create some simulations using this data :
- Starters: Soup $3, Tomato-Mozarella $15, Oysters $20
- Mains: Salad $9, Spaghetti $20, Steak $25, Lobster $40
- Desserts: Ice cream $15, Pie $10
Header
Q 1 : Data Collection
What are the best line of code to import the data is. Read in the data in csv files, understand the data structure, verify quality of data. For this exploratory work, listing values of categorical data and plotting numerical data can help.
Q2 : Data Preparation
Create the columns of data, extract the "features", needed for your later modeling. For each course, create 2 columns with the split costs of food and drinks, i.e. you will be creating a total of 6 columns. You will determine the split costs by assuming that the cost of every course is the cost of a single dish plus cost of drinks, and that the cost of drinks is never greater or equal to $5 (the smaller difference b/w 2 dishes). For example, if the cost of the first course is $5.5, the customer took a $3 soup and drinks for $2.5; if the cost was $20, he took the oysters and no drinks.
Q3 : Modeling
Cluster using Kmeans the data against the cost of food only. You can assume there are 4 clusters. Don't use time or any other information, just cluster on the 3 columns for food cost. Print out specific characteristics per group. Can you figure out which is the healthy group, the retirement group, business and one-time customers? Add these labels to your data. Display the data in a scatter plot.
Q4 : Evaluation
In part3.csv you can find the ID and the actual type. Compare the labels to the actual client type.
Q5 : Simulation
- Plot the distribution of clients.
- Determine the likelihood for each type of client to order a certain course.
- Determine the probability of a certain type of customer ordering a certain dish.
- What's the percentage of clients not ordering a drink with each of his course?
- How would the revenue change if you can influence every other Healthy customer to spend/behave like a Onetime customer instead?
-By how much does revenue go up if the spaghetti dish increases by 10%?
Add one "research question" yourself like this and answer it.
TABLE part1.csv Columns:
CLIENT_ID TIME FIRST_COURSE SECOND_COURSE THIRD_COURSE
ID063527 LUNCH 0 22.31475048 10.10608095
ID951225 DINNER 0 28.77958653 0
ID655745 LUNCH 0 43.52903184 10.91499516
ID381194 DINNER 3.427771934 23.04601705 16.27553216
TABLE part3.csv Columns
CLIENT_ID CLIENT_TYPE
ID063527 Business
ID951225 Onetime
ID655745 Business
ID381194 Retirement
ID660862 Onetime
ID575620 Onetime
ID024460 Business
ID790244 Healthy
Project Management The Managerial Process
ISBN: 9781260570434
8th Edition
Authors: Eric W Larson, Clifford F. Gray