Question: a)Implement an algorithm that creates a CART from performance data as presented in the lecture. The CART must be implemented by yourself, you must not

a)Implement an algorithm that creates a CART from performance data as presented in the lecture. The CART must be implemented by yourself, you must not take a prefabricated Python implementation. The format of the sample data and the representation of the CART are described below. If two split options are equally good, we use the alphabetic ordering of feature names as tie breaker. N.B: Don't use any additional python library except pandas. a)Implement an algorithm that creates a CART from performance data as presented The code template given is as follows: import pandas as pd cart = { "name":"X", "mean":456, "split_by_feature": "aes", "error_of_split": 0.0, "successor_left": { "name":"XL", "mean":1234, "split_by_feature": None, "error_of_split":None, "successor_left":None, "successor_right":None }, "successor_right":{ "name":"XR", "mean":258, "split_by_feature": None,"error_of_split":None, "successor_left":None, "successor_right":None } }

features = ["secompress", "encryption", "aes", "blowfish", "algorithm", "rar", "zip", "signature", "timestamp", "segmentation", "onehundredmb", "onegb"]

def get_cart(sample_set_csv): df = pd.read_csv(sample_set_csv) //TODO: Change the return and code logic here return cart And this is my code which needs to be fixed or you can change it: import pandas as pd

cart = { "name":"X", "mean":456, "split_by_feature": None, "error_of_split": 0.0, "successor_left": None, "successor_right": None } features = ["secompress", "encryption", "aes", "blowfish", "algorithm", "rar", "zip", "signature", "timestamp", "segmentation", "onehundredmb", "onegb"] def mean(numbers): return float(sum(numbers)) / max(len(numbers), 1)

def squared_error_loss(numbers, mean_value): return sum((n-mean_value)**2 for n in numbers)

def get_cart(sample_set_csv): df = pd.read_csv(sample_set_csv) return build_cart(df,features)

def build_cart(df,features, current_node_name="X"): current_mean = df["performance"].mean() current_error = ((df["performance"] - current_mean) ** 2).sum() if len(features) == 0: return { "name":current_node_name, "mean":current_mean, "split_by_feature": None, "error_of_split":current_error, "successor_left":None,"successor_right":None} else: best_feature = None best_error = 999.0 best_left_subset = None best_right_subset = None for feature in features: for value in df[feature].unique(): left_subset = df[df[feature] == value] right_subset = df[df[feature] != value] if len(left_subset) == 0 or len(right_subset) == 0: continue error = sum_squared_error_loss(left_subset, right_subset) if best_feature == None or error

def sum_squared_error_loss(left, right): left_mean = left["performance"].mean() right_mean = right["performance"].mean() left_loss = ((left["performance"] - left_mean) ** 2).sum() right_loss = ((right["performance"] - right_mean) ** 2).sum() return left_loss + right_loss

# Unit Test for the task test_cart = {'name': 'X', 'mean': 763.2, 'split_by_feature': 'segmentation', 'error_of_split': 6.0, 'successor_left': {'name': 'XL', 'mean': 772.0, 'split_by_feature': 'onegb', 'error_of_split': 0.0, 'successor_left': {'name': 'XLL', 'mean': 770.0, 'split_by_feature': None, 'error_of_split': None, 'successor_left': None, 'successor_right': None }, 'successor_right': {'name': 'XLR', 'mean': 773.0, 'split_by_feature': None, 'error_of_split': None, 'successor_left': None, 'successor_right': None } }, 'successor_right': {'name': 'XR', 'mean': 750.0, 'split_by_feature': None, 'error_of_split': None, 'successor_left': None, 'successor_right': None} }

if get_cart("Performance_01.csv") == test_cart: print("passed") else: print("failed")

Note that the data in Performance_01.csv is: Id,secompress,encryption,aes,blowfish,algorithm,rar,zip,signature,timestamp,segmentation, onehundredmb,onegb,performance 0,1,0,0,0,1,1,0,0,0,0,0,0,750 1,1,0,0,0,1,1,0,0,0,1,1,0,773 2,1,0,0,0,1,1,0,0,0,1,0,1,770 3,1,0,0,0,1,1,0,0,1,0,0,0,750 4,1,0,0,0,1,1,0,0,1,1,1,0,773 I provided the code with a unit test. Don't submit an answer if the unit test gives you "Failed" or I will downvote the answer and send email to chegg because this is the third time I asked the same question and no code is working. And don't tell me write a comment since there is no place to write comment here in chegg.

CART Data Structure For this assignment sheet we use the following internal datastructure for a CART. We represent a CART as a python dict with exactly the entries as shown in the example below. This example CART has three nodes. The root node "X", and the two child nodes "XL" and "XR". As you can see the child nodes only have a name and a mean but all other fields are set to None. A parent node also has a name and a mean but additionally a feature by which the split is performed, the error of the split and two sucessors. It is important to use exaclty these names for the features: features = ["secompress", "encryption", "aes", "blowfish", "algorithm", "rar", "zip", "signature", "timestamp", "segmentation", "onehundredmb", "onegb"]

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Implement an algorithm that creates a CART from performance data as presented in the lecture. The CART must be implemented by yourself, you must not take a prefabricated Python implementation. The...

Your task is to analyze a non-functional property, performance, of a fictional tool SECompress, a configurable command-line tool for compressing data. In addition, the data can be encrypted, signed,...

Hi, I need help with this assignment, I have seen the previous question but they are not answered well enough. I need proper implementation as required, I am providing all the required information,...

Hi, Kindly help me implement this get-cart function. Make sure it's correct, as I am going to give you data as well. Here is the data: 1...

Read carefully please, I need an expert for this. Here is the Task1: Here is what I have implemented: Code: # Write your helper functions here, if needed def mean(numbers): return float(sum(numbers))...

Hi, Kindly help me implement this get-cart function. Make sure it's correct, as I am going to give you data as well. Data File Waiting for the Answer as soon as possible In this assignment, we...

Computer Organization and Networks Practicals 2021/22 October 9, 2021 Computer Organization and Networks Practicals 2021/22 b68495714b Contents Contents 0 Introduction 3 0.1 Registration . . . . . ....

Solving Two-stage Robust Optimization Problems by A Constraint-and-Column Generation Method Bo Zeng Department of Industrial and Management Systems Engineering University of South Florida, Email:...

Lesson 12 Quiz (Show/Explain all Work) IST 230 Relations on Sets, Databases 1. Let A = {0, 1, 2, 3, 4, 5, 6, 7, 8} and B = {1, 2, 3, 4, 5, 6, 7, 8}. Now let R be a binary relation R from A to B such...

"According to a report from The World Bank- Malaysia Economic Monitor, the working class in Malaysia wastes about 1 million hours daily stuck in traffic congestions. Traffic jams are quite a synonym...

Discuss one of the goals of Performance Evaluation Systems and the significance of its role in decentralization?

Is true concerning the prepayment risk of mortgage backed securities

Smart Strike Company manufactures and sells soccer balls for teams of children in elementary and high school. Smart Strike's best-selling lines are the practice ball line (durable soccer balls for...

=+Are you interested in working on global teams?

=+Do you want to work from home?

=+ What skills and competencies will enable someone