Question: Why this comment still shows? One or more test cases in this cell did not pass.Instructor hints: 1. For Problem 2, Part A, look

Why this comment still shows?   One or more test cases in this cell did not pass.Instructor hints: 1. "For Problem 2, Part A, look at the shape of X_train."2. "For Problem 2, Part A, look at label_names."3. "For Problem 2, Part A, look at the shape of y_train."
What code is correct? I did 5 times...


Problem 2: Decision Tree, post-pruning and cost complexity parameter using sklearn 0.22 [10 points, Peer Review]

We will use a pre-processed natural language dataset in the CSV file "spamdata.csv" to classify emails as spam or not. Each row contains the word frequency for 54 words plus statistics on the longest "run" of captial letters.

Word frequency is given by:

????????=????????/????

Where ???????? is the frequency for word ????, ???????? is the number of times word ???? appears in the email, and ???? is the total number of words in the email.

We will use decision trees to classify the emails.

Part A [5 points]: Complete the function get_spam_dataset to read in values from the dataset and split the data into train and test sets.

Student's answer(Top)

import numpy as np# Load the spam dataset using the get_spam_dataset functiondef get_spam_dataset(filepath="data/spamdata.csv", test_split=0.1):    data = np.genfromtxt(filepath, delimiter=',', dtype=float, skip_header=1)    np.random.shuffle(data)        n = len(data)    split_idx = int(n * (1 - test_split))        X_train = data[:split_idx, 1:-1]  # Features are columns 1 to -2    X_test = data[split_idx:, 1:-1]    y_train = data[:split_idx, -1]   # Target variable is last column    y_test = data[split_idx:, -1]    feature_names = list(np.genfromtxt(filepath, delimiter=',', dtype=str, max_rows=1)[1:-1])        return X_train, X_test, y_train, y_test, feature_names# Load the dataset into five variables: x_train, x_test, y_train, y_test, Label_namesx_train, x_test, y_train, y_test, Label_names = get_spam_dataset(filepath="data/spamdata.csv", test_split=0.1)# Print out the shape of the loaded datasetsprint("x_train shape:", x_train.shape)print("x_test shape:", x_test.shape)print("y_train shape:", y_train.shape)print("y_test shape:", y_test.shape)print("Feature Names:", Label_names)

Student's answer(Top)

# TO-DO: import the data set into five variables: X_train, X_test, y_train, y_test, label_names# Uncomment and edit the line below to complete this task.test_split = 0.1 # default test_split; change it if you'd like; ensure that this variable is used as an argument to your function# your code hereX_train, X_test, y_train, y_test, feature_names = get_spam_dataset(filepath="data/spamdata.csv", test_split=test_split)# X_train, X_test, y_train, y_test, label_names = np.arange(5)

Grade cell: cell-d0ee21615c2bf06eScore: 0.0 / 22.73 (Top)

# tests X_train, X_test, y_train, y_test, and label_namesHidden Tests Redacted
One or more test cases in this cell did not pass.Instructor hints: 1. "For Problem 2, Part A, look at the shape of X_train."2. "For Problem 2, Part A, look at label_names."3. "For Problem 2, Part A, look at the shape of y_train."

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock

The question seems incomplete as youve mentioned parts of the problem instructions and your code but theres a lack of context on the specific issue yo... View full answer

blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Computer Engineering Questions!