Question: Given Python code to train a shallow decision tree for the classification of flowers. The input features are the flowers sepal length, sepal width, petal

Given Python code to train a shallow decision tree for the classification of flowers. The input features are the flowers sepal length, sepal width, petal length, and petal width, and the label corresponds to the species, i.e. iris setosa, iris ver- sicolor, or iris virginica. The data is provided in the file iris.csv.1 The code computes the training accuracy, i.e. the accuracy obtained when classifying all instances from the training dataset that was used to build the classifier in the first place.

(a) Add a few lines of code to compute the accuracy in a 10-fold cross-validation set up. Use functions or methods from sklearn for k-fold cross-validation instead of implementing your own. This will save you time and help you to get more familiar with sklearn. Include your code in your hard copy homework solution, either handwritten (its just a few lines of code) or a printout. You shouldnt make changes to the code that was provided, so there is no need to include any other code in your homework solution than the few lines that you added. I will not run your code for this homework problem, so you do not need to upload your code.

Given Python code to train a shallow decision tree for the classification

Below is the .csv file:

of flowers. The input features are the flowers sepal length, sepal width,

(b) What is the training accuracy? What is the accuracy obtained using 10-fold cross- validation? Briefly comment on which one is the lowest, and why that does (or does not) agree with your expectations.

import pandas as pd from sklearn import tree from sklearn import metrics \# Read the dataset into a dataframe and map the labels to numbers df = pd.read_csv('iris.csv') map_to_int ={ setosa':0, 'versicolor':1, 'virginica':2\} df["label"] = df["species"].replace(map_to_int) print(df) \# Separate the input features from the label features = list(df.columns [:4]) X = df[features] y=df["label"] \# Train a decision tree and compute its training accuracy clf = tree.DecisionTreeclassifier(max_depth=2, criterion='entropy') clf.fit(X, y) print(metrics.accuracy_score(y, clf.predict(X)))

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!