Question: 1) import pandas and matplotlib 2) open our dataset on blackboard, games.csv. It is about lots of games and their popularity and ratings. We want
1) import pandas and matplotlib
2) open our dataset on blackboard, games.csv. It is about lots of games and their popularity and ratings. We want to use it for regression on the field average_rating and guess the ratings of a new game.
3) Have a good look at the data and show some statistics (head, info, describe). Notice anything odd about the data? Does it need any Cleaning? If so, clean it.
4) draw a histogram of the average_rating field. Anything odd about it? If so, fix it and re-draw the histogram.
5) Several fields are not very useful in regression ,so drop them : id, type, name and bayes_average_rating
6) Now we have all numeric fields, right?
7) Split into test set and train set (20% test)
8) Now divide each set into x (input ) and Y (output) so you have total of 4 datasets.
9) Lets scale the X data using minmax or standardization, you can scale the Ys too if you want.
10) Run Linear Regression on the data, use MSE or RMSE as the measure of quality. I got an MSE of 2.07
11) Run a Random Forest Regressor with 100 estimators at least, I got an MSE of 1.46. Do you get better results with using more estimators? Try it out.
Need answer for steps 10 and 11..........
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
