Question: The data is available at: https://github.com/Mcompetitions/M5-methods https://drive.google.com/drive/folders/1wxz-TAfVE7uKGCjh405eCb2Q_pG3kAm9 File 1: calendar.csv Contains information about the dates the products are sold. date: The date in a y-m-d
The data is available at:
https://github.com/Mcompetitions/M5-methods
https://drive.google.com/drive/folders/1wxz-TAfVE7uKGCjh405eCb2Q_pG3kAm9
File 1: "calendar.csv"
Contains information about the dates the products are sold.
date: The date in a "y-m-d" format.
wm_yr_wk: The id of the week the date belongs to.
weekday: The type of the day (Saturday, Sunday, ..., Friday).
wday: The id of the weekday, starting from Saturday.
month: The month of the date.
year: The year of the date.
event_name_1: If the date includes an event, the name of this event.
event_type_1: If the date includes an event, the type of this event.
event_name_2: If the date includes a second event, the name of this event.
event_type_2: If the date includes a second event, the type of this event.
snap_CA, snap_TX, and snap_WI: A binary variable (0 or 1) indicating whether the stores of CA,TX or WI allow SNAP purchases on the examined date. 1 indicates that SNAP purchases are allowed.
About SNAP;
"The United States federal government provides a nutrition assistance benefit called the Supplement Nutrition Assistance Program (SNAP). SNAP provides low income families and individuals with an Electronic Benefits Transfer debit card to purchase food products. In many states, the monetary benefits are dispersed to people across 10 days of the month and on each of these days 1/10 of the people will receive the benefit on their card."
File 2: "sell_prices.csv"
Contains information about the price of the products sold per store and date.
- store_id: The id of the store where the product is sold.
- item_id: The id of the product.
- wm_yr_wk: The id of the week.
- sell_price: The price of the product for the given week/store. The price is provided per week
(average across seven days). If not available, this means that the product was not sold during the
examined week. Note that although prices are constant at weekly basis, they may change through
time (both training and test set).
File 3: "sales_train.csv"
Contains the historical daily unit sales data per product and store.
- item_id: The id of the product.
- dept_id: The id of the department the product belongs to.
- cat_id: The id of the category the product belongs to.
- store_id: The id of the store where the product is sold.
- state_id: The State where the store is located.
- d_1, d_2, ..., d_i, ... d_1941: The number of units sold at day i, starting from 2011-01-29.
Submission
Task A - Exploratory Data Analysis - please include the code that answers the following questions in your notebook submission in Task B (50 points):
- Which state has the highest sales? (5 points)
- Which department has the highest sales? (5 points)
- Which department has the highest number of products? (5 points)
- Which department has the highest mean price? (5 points)
- Which is the best-performing store? (5 points)
- Which month had the highest sales? (5 points)
- Which weekday do people prefer to do grocery shopping in different states? (3 answers = 15 points)
- Which holiday or event recorded the highest sales? (5 points)
Step by Step Solution
3.31 Rating (148 Votes )
There are 3 Steps involved in it
The data is available at httpsgithubcomMcompetitionsM5methods httpsdrivegooglecomdrivefolders1wxzTAfVE7uKGCjh405eCb2QpG3kAm9 File 1 calendarcsv Contains information about the dates the products are so... View full answer
Get step-by-step solutions from verified subject matter experts
