Question: ead the online documentation on decision trees and random forests in scikit - learn to find out how to use decision trees and random forests.

ead the online documentation on decision trees and random forests in scikit-learn to find out how to use decision trees and random forests. Notice that training a classifier is done using the fit method, and that for decision trees this is done using a more sophisticated evolution (known as CART) of the ID3 algorithm covered in class.
On random seeds: Many functions in scikit-learn, including models as well as utilities, use randomization. For ease of grading, we will fix a random seed for questions 1,2, and 3 so as to make behavior deterministic. We will use a random seed of 10. This can generally be done by passing in random state=10 to the function; please consult documentation if unsure. For cross- validation methods, though, you will likely need to set the cv argument instead. You can do this by setting cv=KFold(n splits=,random state=10,shuffle=True)).
1.[10 points] Use the breast cancer data set from Homework 0 to create a training set. Recall that the label is 0 if the patients data indicates a malignant cancer and 1 otherwise. Compute the base rate of malignant cancer occurrence over the entire data set. In other words, what would be your best guess for the probability of malignant cancer of a single example using only the labels in the training set? This question is very simple, so try not to overthink it.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Finance Questions!