Question: The question below uses the famous iris data set that contains measurements for 5 0 flow - ers from each of 3 species of iris.
The question below uses the famous iris data set that contains measurements for flow ers from each of species of iris. The data can be accessed by loading the RData object called TutWkData.RData The objective for this question is to use the Sepal features to distinguish the Virginica iris species from the other two species. The binary response variable is called Virginica and the two Sepal predictor variables, all in centimeters, are Sepal.Length and Sepal.Width. Plot the feature space using a different symbol to differentiate Virginica flowers from the other species. Fit a logistic regression model and overlay the resulting decision boundary on the plot. How many points will be misclassified? Hint: make sure your X variables are plotted on the correct axes. Using the tree library with the default stopping criteria, fit a classification tree to these data. Plot the partitioning of the feature space and compare to the logistic regression model. The tree package uses a different metric to what we have covered for a categorical response. Calculate the Shannonentropy for the full data ie at the root node. Use rpart to fit a full, unconstrained tree to these data and then use cost complexity pruning to prune the tree. Show the resulting tree structure and calculate the reduction in the Shannonentropy metric for the pruned tree. According to your pruned model: what is the probability that a flower is a Virginica iris if it has a Sepal length of cm and a Sepal width of cm From these results, would you say that Virginica iris flowers have a different Sepal width to the other species?
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
