Question: Do These Things in Python 3 in a Jupyter Notebook Preferably Link to datasets: https://mega.nz/#!J7xXALAC!FSct4RSgr-95ra0Yq7iz9-dNmT1CByefrHcEvtCG1gg Once you have your csv files on your computer, do
Do These Things in Python 3 in a Jupyter Notebook Preferably
Link to datasets:
https://mega.nz/#!J7xXALAC!FSct4RSgr-95ra0Yq7iz9-dNmT1CByefrHcEvtCG1gg
Once you have your csv files on your computer, do the following five (5) things.
1 Import each of the csv files you downloaded into a pandas DataFrame. (a) Provide the code you used to do this. (Just the code for reading one of the files will suffice to show how you did it. Correctly, of course.) (b) Print out the column names of your item DataFrame and print the first four (4) records in it. (c) Describe the data types of the columns in each of your DataFrames. Hint: Provide a count of each data type in the columns. Include your commented code for each of the above.
2 Write each of you pandas DataFrames to a local SQLite DB named xyz.db. Include only data for active buyers in these tables. Verify that you have written the tables to your SQLite DB correctly. (Commented code, of course.) Name your tables item, customer, and mail. SQLite3 should be included with your Anaconda distribution. You might want to download a copy from http://www.sqlite.org (http://www.sqlite.org) that you can use from the command prompt. SQLite3 is a serverless RDB that requires no configuration to use. You'll see from the website that SQLite is ubiquitous.
3 Now, using the same data that you imported in 2, above, create a new DataFrame called custSum that you also write as a table to xyz.db, and that has the following characteristics. This DataFrame should have one row per customer record.
(a) Include on each customer's record a binary, Y or N, indicator of whether the customer is a 'heavy buyer,' where the definition of a 'heavy buyer' is a customer whose YTD purchasing in 2009 is greater than 80% of the 2009 YTD purchasing of all customers who are active buyers. Verify your coding of this new variable. (b) Add to each customer's record whether the customer has any form of each of the following credit cards: AMEX, Discover, VISA, and Mastercard. Create for each type of card whether a customer has one with a "Y" or a "N". Document your creation of these codes by showing how they are related to the code values in the data. (c) Add to each customer's record their estimated HH income, and the genders of adults "1" and "2." (d) Add to each customer's record their ZIP code and ZIP+4 code. (e) Be sure to include the account number on each customer's record. (f) Provide a count of the number of records in this DataFrame. (f) Write custSum as a SQL table to xyz.db. (g) Verify that you have written this DataFrame to xyz.db correctly.
4 Create a new pandas DataFrame of data that will be used for target marketing and write it out to a headered csv file.
(a) This DataFrame should have one row per customer. The customers included should be active buyers or lapsed buyers. (b) The row for each customer should include the customer's account identifier, and an indicator variable (Y/N, or 1/0) for each product category the customer has made at least one purchase in. (c) Include for each customer their buyer status, and the total dollar amount of the purchases they have made from XYZ using all data available for him or her. (d) Write your DataFrame to a csv file that has a header record of the column names, and also store it in a shelve database. (e) Verify that the files you wrote your customer DataFrame to were written correctly. (Commented code, of course.)
5 Report the three (3) most frequently purchased product categories by the gender of "adult 1" using only the data for the active customers. For each category, report the total spend in dollars, the total number of products ('items')purchased. Finally, report the number of adults in each gender category. (Be sure to comment your code.)
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
