Question: Given Python code to train a shallow decision tree for the classification of flowers. The input features are the flowers sepal length, sepal width, petal

Given Python code to train a shallow decision tree for the classification of flowers. The input features are the flowers sepal length, sepal width, petal length, and petal width, and the label corresponds to the species, i.e. iris setosa, iris ver- sicolor, or iris virginica. The data is provided in the file iris.csv.1 The code computes the training accuracy, i.e. the accuracy obtained when classifying all instances from the training dataset that was used to build the classifier in the first place.

(a) Add a few lines of code to compute the accuracy in a 10-fold cross-validation set up. Use functions or methods from sklearn for k-fold cross-validation instead of implementing your own. This will save you time and help you to get more familiar with sklearn. Include your code in your hard copy homework solution, either handwritten (its just a few lines of code) or a printout. You shouldnt make changes to the code that was provided, so there is no need to include any other code in your homework solution than the few lines that you added. I will not run your code for this homework problem, so you do not need to upload your code.

Given Python code to train a shallow decision tree for the classification

Below is the .csv file:

of flowers. The input features are the flowers sepal length, sepal width,

(b) What is the training accuracy? What is the accuracy obtained using 10-fold cross- validation? Briefly comment on which one is the lowest, and why that does (or does not) agree with your expectations.

import pandas as pd from sklearn import tree from sklearn import metrics \# Read the dataset into a dataframe and map the labels to numbers df = pd.read_csv('iris.csv') map_to_int ={ setosa':0, 'versicolor':1, 'virginica':2\} df["label"] = df["species"].replace(map_to_int) print(df) \# Separate the input features from the label features = list(df.columns [:4]) X = df[features] y=df["label"] \# Train a decision tree and compute its training accuracy clf = tree.DecisionTreeclassifier(max_depth=2, criterion='entropy') clf.fit(X, y) print(metrics.accuracy_score(y, clf.predict(X)))

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Please complete the following in R 5. 3.5 4.7 1.5 5.0 19 Sepal Length Sepal. Width Petal. Length Petal. Width Species 0.2 setosa * 2 3.0 0.2 setosa 18 3 3.2 0.2 setosa 4.6 0.2 setosa # 5 3.6 1.4 0.2...

Please help code the following in python. Thank you. Link To the Iris Flower DataSet: https://en.wikipedia.org/wiki/Iris_flower_data_set Understanding the problem For this problem set we will be...

Please help fix the code and help answer the challenge question. Thank you. Understanding the problem For this problem set we will be working with on with a popular data set used in biological...

Need R Code for the Following: Iris Flower Species Classification In Built Dataset in R Objective: The primary objective of this assignment is to explore classification techniques using the 'Iris'...

Please help code the following inn python. Thank you All code must be executable from scratch. Code should be written properly (i.e. put code in functions as needed, declare variables as needed, and...

ECO987Q3 1. (10) Briey discuss the following statements (keep your answers short and concise): (a) The consumption-based capital asset pricing model is inconsistent with high volatility of stock...

Menu DS1000 - Assignment 3.p.. X + Create Sign in X All tools Edit Convert E-Sign Find text or tools Q Al Assistant Part 1 - Written Answer (Be sure to show all your work) Looking for key takeaways?...

(1). Please report the accuracy of the prediction report (2) choose the class with the highest confidence score (3) Please report the errors (scatter plot) (4) do not use scikit-learn library Given...

Question No 0 5 : Iris dataset has 5 0 samples for each of three different species of Iris flower ( total number of samples is 1 5 0 ) . For each data sample, you have sepal length, sepal width,...

1 : [ 5 , 4 , 3 , 2 ] 2 : [ 8 , 9 , 3 , 7 ] distance = sqrt ( ( 5 - 8 ) ^ ( 2 ) + ( 4 - 9 ) ^ ( 2 ) + ( 3 - 3 ) ^ ( 2 ) + ( 2 - 7 ) ^ ( 2 ) ) which equals 7 . 6 8 Write a function to calculate this...

You buy a bond on Thursday April 6, 2023. The trade will settle the following Monday April 10. The bond is a treasury bond with a maturity of July 1 2030 and a coupon rate of 3.75%. Yields on similar...

Using the P/E ratio approach to valuation, calculate the value of a share of stock under the following conditions: The investors required rate of return is 12 percent, The expected level of...

Question 5 2 pts True / False ? If controls around the credit approval / granting process are not operating effectively, the allowance for uncollectible accounts may need to be decreased. True fater

CT Corp Comprehensive Question Canadian Tire Corporation, Limited (Canadian Tire) is a family of companies that includes a retail segment and a financial services division, among others. The retail...

2. What is the rule about off-duty conduct in regard to employee discipline? Background On March 30, 2006, David Gates was arrested and charged with felony Conspiracy to Distribute Schedule II...

1. Which party has the burden of proof in this case? Why? Background On March 30, 2006, David Gates was arrested and charged with felony Conspiracy to Distribute Schedule II Narcotics and Felony...

4. Indicate the comparative advantages and disadvantages of a disciplinary price list (see Exhibit 12.3 on pg. 531) of disciplinary prerogatives in the labor agreement and a onesentence contractual...