Question: Simulate a binary classification dataset with a single feature via a mixture of normal distributions using R (Hint: Generate two data frames with the random

Simulate a binary classification dataset with a single feature via a mixture of normal distributions using R (Hint: Generate two data frames with the random number and a class label, and combine them together). The normal distribution parameters (using the function rnorm) should be (5,2) and (-5,2) for the pair of samples - you can determine an appropriate number of samples. Induce a binary decision tree (using rpart), and obtain the threshold value for the feature in the first split. How does this value compare to the empirical distribution of the feature? How many nodes does this tree have? What is the entropy and Gini at each? Repeat with normal distributions of (1,2) and (-1,2). How many nodes does this tree have? Why? Prune this tree (using rpart.prune) with a complexity parameter of 0.1. Describe the resulting tree.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!