Question: The code for each Part of this should be self-contained, that is, each of Part 1, 2, and 3 should contain all the necessary code

  • The code for each Part of this should be self-contained, that is, each of Part 1, 2, and 3 should contain all the necessary code and not rely on code from another Part of the lab in order to run.
  • all parts of the lab should be done using python, sklearn, pandas, numpy, and matplotlib.

Part 1 - Creating and evaluating a random forest model

In this part of the lab, you should:

  • read in the data;
  • verify that all the data is numeric and that there are no missing values;
  • split the data into training and validation sets (don't worry about creating a final test set);
  • create a random forest model using the data;
  • evaluate the model on both the training and validation sets using MAE and % error.

Part 2 - Exploring the n_estimators hyper-parameter

In this part of the lab you should:

  • use a for loop to create a random forest model for each value of n_estimators from 1 to 30;
  • evaluate each model on both the training and validation sets using MAE;
  • visualize the results by creating a plot of n_estimators vs MAE for both the training and validation sets.

After that you should answer the following questions:

  • Which value of n_estimators gives the best results?
  • Explain how you decided that this value for n_estimators gave the best results;
  • Why is the plot you created above not smooth?
  • Was the result here better than the result of Part 1? What % better or worse was it?

Part 3 - Exploring the max_features hyper-parameter

In this part of the lab you should:

  • use a for loop to create a random forest model for each value of max_features from 1 to the total number of features in the data;
  • for each model, use the value for n_estimators as determined in Part 2;
  • evaluate each model on both the training and validation sets using MAE;
  • visualize the results by creating a plot of max_features vs MAE for both the training and validation sets.

After that you should answer the following questions:

  • Which value of max_features gives the best results?
  • Explain how you decided that this value for max_features gave the best results;
  • Was the result here better than the result of Part 2? What % better or worse was it?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!