Question: Analysis of an E - commerce Dataset We have been provided with a combined e - commerce dataset. In this dataset, each user has the

Analysis of an E-commerce Dataset We have been provided with a combined e-commerce dataset. In this dataset, each user has the ability to post a rating and review for the products they purchased. Additionally, other users can evaluate the initial rating and review by expressing their trust or distrust. This dataset includes a wealth of information for each user. Details such as their profile, ID, gender, city of birth, product ratings (on a scale of 1-5), reviews, and the prices of the products they purchased are all included. Moreover, for each product rating, we have information about the product name, ID, price, and category, the rating score, the timestamp of the rating and review, and the average helpfulness of the rating given by others (on a scale of 1-5). The dataset is from several data sources, and we have merged all the data into a single CSV file named 'A Combined E-commerce Dataset.csv'. The structure of this dataset is represented in the header shown below. | userId | gender | rating | review| item | category | helpfulness | timestamp | item_id | item_price | user_city||----|----|----|----|----|----|----|----|----|----|----| Description of Fields userId - the user's id gender - the user's gender rating - the user's rating towards the item review - the user's review towards the item item - the item's name category - the category of the item helpfulness - the average helpfulness of this rating timestamp - the timestamp when the rating is created item_id - the item's id item_price - the item's price user_city - the city of user's birth Note that, a user may rate multiple items and an item may receive ratings and reviews from multiple users. The "helpfulness" is an average value based on all the helpfulness values given by others. There are four questions to explore with the data as shown below.
Q1. Remove missing data Please remove the following records in the csv file: gender/rating/helpfulness is missing review is 'none' Display the DataFrame, counting number of Null values in each column, and print the length of the data before and after removing the missing data.
Q2. Descriptive statistics With the cleaned data in Q1, please provide the data summarization as below: Q2.1 total number of unique users, unique reviews, unique items, and unique categories Q2.2 descriptive statistics, e.g., the total number, mean, std, min and max regarding all rating records Q2.3 descriptive statistics, e.g., mean, std, max, and min of the number of items rated by different genders Q2.4 descriptive statistics, e.g., mean, std, max, min of the number of ratings that received by each items # your code and solutions
Q3. Plotting and Analysis Please try to explore the correlation between gender/helpfulness/category and ratings; for instance, do female/male users tend to provide higher ratings than male/female users? Hint: you may use the boxplot function to plot figures for comparison (Challenge) You may need to select the most suitable graphic forms for ease of presentation. Most importantly, for each figure or subfigure, please summarise what each plot shows (i.e. observations and explanations). Finally, you may need to provide an overall summary of the data.
Q4. Detect and remove outliers We may define outlier users, reviews and items with three rules (if a record meets one of the rules, it is regarded as an outlier): reviews of which the helpfulness is no more than 2 users who rate less than 7 items items that receives less than 11 ratings Please remove the corresponding records in the csv file that involves outlier users, reviews and items. After that, print the length of the data. can you just answer the queston 4 regarding th outliers

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!