Question: Question 1. [70 marks] This question is designed to help you understand the working of a decision tree (DT). You will find a dataset,here in

Question 1. [70 marks] This question is designed to help you understand the working of a decision tree (DT). You will find a dataset,here in this link (https://drive.google.com/file/d/12dFVkfPbQXQkVN8ESa7dhZoO3Mnsgwbe/view?usp=sharing ), containing information used to classify penguins into 3 species. You need to implement a classification decision tree (DT) from scratch [you are not permitted to use any 3rd-party librarys function for the classifier e.g. scikit. You may, however, use built-in functions for auxiliary tasks like train/test split, etc.]. With respect to the cost function to be used to find splits, everywhere in this question, you have to implement one of the following based on your roll number: Odd roll numbers - Gini index Even roll numbers - entropy The implementation includes the following tasks. - 1. Perform pre-processing and visualization of the dataset. Perform categorical encoding wherever applicable and split the data into train and test sets - [7 marks]. 2. Implement the cost function as per your roll num. (details above) - [5 marks]. 3. In order for the decision tree to work successfully, continuous variables need to be converted to categorical variables first. To do this, you need to implement a decision function that makes this split. Let us call that cont_to_cat(). The details of the function are the following. [10 marks]:- a. Assume that the continuous variables are independent of each other i.e. assuming 2 continuous variables A and B, the split of A does not in any way affect the split you will perform in B.

b. The continuous variables should only be split into 2 categories, and the optimal split is one that divides the samples the best, based on the value of the function you have been allotted (as per your roll number).

4. After step 2, all the attributes would have categorical values, so now you can go ahead and implement the training function. This would include implementing the following helper functions: [25 marks] a. Get the attribute that leads to the best split b. Make that split c. Repeat these steps for the newly-created split 5. The DT should also include the following properties in the train function a. There should be a max depth that should be defined i.e. a depth after which the tree shouldnt be allowed to grow [5 marks] b. The algorithm should self-identify when there is no information gain being done, i.e. the model has plateaued in its training and shouldnt grow further. [5 marks] 6. Write a function which is responsible for classification (i.e. at test time). [10 marks] 7. Find out the accuracy you get on the test data (overall and class-wise). [3 marks] Note: Implementing the wrong cost function will attract a loss of credit in the above question.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

i need help from section 3 part A I have already posted this and you asked me to post under advanced option Assignment Tier 1 Financial Planning (Tier1FPPA_AS_v1A3) Page 2 of 92 Student...

I need help with Section 4 part A please look at the case study Assignment Tier 1 Financial Planning (Tier1FPPA_AS_v1A3) Page 2 of 92 Student identification (student to complete) Please complete the...

Question #1: (10 marks) Lecture week 7, Lecture 1 Staph Infections....Staphylococcus aureus is the bacterium that causes staph infections (at least Google says so...) . Antibiotic resistance in S....

Help me please. In the month of September, all of the walmart grocery stores in Virginia sold 429,382 eggs. In that same month, all of the walmart grocery stores in Texas sold 283,054 eggs. How many...

I would like assistance with assignment 3 and 4 on the attached document I have been struggling with the subject and its my last AUI4863/102/0/2016 Tutorial letter 102/0/2016 ADVANCED INTERNAL AUDIT...

Unit Information PEN593 Energy Economics Teaching Period: S2 2022 This guide should be used in conjunction with the Handbook as the official source of information about this unit. Refer to myMurdoch...

How to Avoid Being Part of 90% of Failed Companies As an entrepreneur, you have probably heard or read the "maxim" that only 10% of startups are successful, but is this true? And if so, what...

Question 1 (20 marks) Avinash is a regular visitor to a local caf that provides a self-service facility where customers have the option of selecting a drink or food item from a menu displayed on a...

Read the case study and answer all the question that follow: Leading by example: Growing into management by helping others grow When Google offered Shawn Hurley a job, he had to make a decision: did...

How to Avoid Being Part of 90% of Falled Companies As an entrepreneur, you have probably heard or read the "maxim" that only 10% of startups are successful, but is this true? And if so, what...

Friendly Ice Cream Corporation operates and franchises full-service restaurants from Maine to South Carolina. The company also manufactures the ice cream that is served in its restaurants and sold in...

take these two companies (Walmart and Costco) Scanning the companies' websites and ESG, CSR, or Sustainability reports, you will compare their sustainability goals and objectives. you will write an...

A multiloop block diagram is shown in Figure AP9.10. (a) Compute the transfer function T1s2 = Y1s2>R1s2. (b) Determine K such that the steady-state tracking error to a unit step input R1s2 = 1>s is...

8:37 * N. 80% i ... OBJECTIVES: Create relationships Create a Pivot Table from Related Tables Create a PivotChart Modify the PivotChart The major section in this chapter :ontinuation is: Data...

What are some additional ways you could slice loans that would be useful to lenders as they evaluate collectability of loans?

What might explain the pattern of loans in the second half of the year?

What is the median debt-to-income ratio?