Question: Analysis of an E - commerce Dataset We have been provided with a combined e - commerce dataset. In this dataset, each user has the
Analysis of an Ecommerce Dataset We have been provided with a combined ecommerce dataset. In this dataset, each user has the ability to post a rating and review for the products they purchased. Additionally, other users can evaluate the initial rating and review by expressing their trust or distrust. This dataset includes a wealth of information for each user. Details such as their profile, ID gender, city of birth, product ratings on a scale of reviews, and the prices of the products they purchased are all included. Moreover, for each product rating, we have information about the product name, ID price, and category, the rating score, the timestamp of the rating and review, and the average helpfulness of the rating given by others on a scale of The dataset is from several data sources, and we have merged all the data into a single CSV file named 'A Combined Ecommerce Dataset.csv The structure of this dataset is represented in the header shown below. userId gender rating review item category helpfulness timestamp itemid itemprice usercity Description of Fields userId the user's id gender the user's gender rating the user's rating towards the item review the user's review towards the item item the item's name category the category of the item helpfulness the average helpfulness of this rating timestamp the timestamp when the rating is created itemid the item's id itemprice the item's price usercity the city of user's birth Note that, a user may rate multiple items and an item may receive ratings and reviews from multiple users. The "helpfulness" is an average value based on all the helpfulness values given by others. There are four questions to explore with the data as shown below.
Q Remove missing data Please remove the following records in the csv file: genderratinghelpfulness is missing review is 'none' Display the DataFrame, counting number of Null values in each column, and print the length of the data before and after removing the missing data.
Q Descriptive statistics With the cleaned data in Q please provide the data summarization as below: Q total number of unique users, unique reviews, unique items, and unique categories Q descriptive statistics, eg the total number, mean, std min and max regarding all rating records Q descriptive statistics, eg mean, std max, and min of the number of items rated by different genders Q descriptive statistics, eg mean, std max, min of the number of ratings that received by each items # your code and solutions
Q Plotting and Analysis Please try to explore the correlation between genderhelpfulnesscategory and ratings; for instance, do femalemale users tend to provide higher ratings than malefemale users? Hint: you may use the boxplot function to plot figures for comparison Challenge You may need to select the most suitable graphic forms for ease of presentation. Most importantly, for each figure or subfigure, please summarise what each plot shows ie observations and explanations Finally, you may need to provide an overall summary of the data.
Q Detect and remove outliers We may define outlier users, reviews and items with three rules if a record meets one of the rules, it is regarded as an outlier: reviews of which the helpfulness is no more than users who rate less than items items that receives less than ratings Please remove the corresponding records in the csv file that involves outlier users, reviews and items. After that, print the length of the data. can you just answer the queston regarding th outliers
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
