Question: The question below uses the famous iris data set that contains measurements for 5 0 flow - ers from each of 3 species of iris.

The question below uses the famous iris data set that contains measurements for

50

flow

-

ers from each of

3

species of iris. The data can be accessed by loading the RData object called

TutWk

8

Data.RData

.

The objective for this question is to use the Sepal features to distinguish the Virginica iris species from the other two species. The binary response variable is called Virginica and the two Sepal predictor variables, all in centimeters, are Sepal.Length and Sepal.Width.

2.1 .

Plot the feature space using a different symbol to differentiate Virginica flowers from the other species.

2.2 .

Fit a logistic regression model and overlay the resulting decision boundary on the plot. How many points will be misclassified? Hint: make sure your X variables are plotted on the correct axes.

2.3 .

Using the tree library with the default stopping criteria, fit a classification tree to these data. Plot the partitioning of the feature space and compare to the logistic regression model.

2.4 .

The tree package uses a different metric to what we have covered for a categorical response. Calculate the Shannon

-

entropy for the full data i

.

.

at the root node.

2.5 .

Use rpart to fit a full, unconstrained tree to these data and then use cost complexity pruning to prune the tree. Show the resulting tree structure and calculate the reduction in the Shannon

-

entropy metric for the pruned tree.

2.6 .

According to your pruned model: what is the probability that a flower is a Virginica iris if it has a Sepal length of

7

cm and a Sepal width of

3

? 2.7 .

From these results, would you say that Virginica iris flowers have a different Sepal width to the other species?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Fisher's (1936) iris data set provides the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 150 flowers (50 flow- ers from each of the...

Jupiter Notebook We have covered some of the limitations of single layer neural networks in class, but they are still powerful learning systems that provide a good way to begin learning about how to...

plots should be graphed using ggplot2 package. Graphs in base R won't work (1) Data set iris in R is a famous data set with the measurements in centimeters of the variables sepal length and width and...

Please help fix the code and help answer the challenge question. Thank you. Understanding the problem For this problem set we will be working with on with a popular data set used in biological...

Object 1 WebAssign Welcome, jeremy.guillory@grandcanyon (log out) Friday, November 18 2016 03:11 AM MST Home My Assignments Grades Communication Calendar My eBooks My Ebooks Introduction to the...

use R for coding The following information is used in the next two questions. The iris data in the datasets package is a famous data set that gives the measurements (in centimeters) of the variables...

Chart Assignment: After reviewing Ch. 10 of aplia in the text, you will put your knowledge to work. You will be constructing charts based on the information below. You must use the information found...

HBSP Product Number TCG 333 THE CRIMSON PRESS CURRICULUM CENTER THE CRIMSON GROUP, INC. Note on Strategic DecisionMaking in Healthcare Organizations During the next several decades, healthcare...

2.25 Blossom widths A data set (available at the books website) analyzed by the famous statistician R. A. Fisher consisted of measurements of different varieties of iris blossoms. Histograms...

Use conditional statement conditions: 1. Display variable 'a' if odd or even number. a. Hint: mod function follows the convention that mod(a,0) returns a, by dividing by 2, the remainder is either 1...

Suppose Minnesota Machines (MM) is trying to price an export order from Russia. Payment is due nine months after shipping. Given the risks involved, MM would like to factor its receivable without...

If you are told $ 1 will buy 2 5 korean won, then you were given a ( n ) indirect quote, direct quote, cross - rate and none of the options.

Which one of the following categories of expenses usually remains the same during the period of restoration after a property loss? Available answer options Select only one option A Ordinary payroll B...