Question: The data file diamonds_clean.rds Tasks 1. Read the diamonds_clean.rds data into a tibble and display it. 2. Split the data into training and testing data

The data file

diamonds_clean.rds

Tasks

1. Read the diamonds_clean.rds data into a tibble and display it.

2. Split the data into training and testing data sets, where the testing data set consists of 20% of the total data.

3. Create an initial classification decision tree model, but don't specify any values for the hyperparameters except for mode.

4. Fit the model to predict the Clarity variable using all independent variables and view it.

5. Plot the fitted decision tree.

6. Use the model to get predictions for the test dataset.

7. Display a confusion matrix for this model.

8. Display the variables for this model sorted by importance.

9. Create a new decision tree model, but this time set the hyperparameters set manually. (Larger values for tree_depth and smaller values for min_n and cost_complexity should help your model).

10. Fit the new model to use two important variables from the previous model.

11. Get a new set of predictions for the testing data using the new model.

12. Display a confusion matrix for the new model. Did your accuracy improve? If not, try adjusting the hyperparameters again.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!

Write the code in python for the logisitic regression (- binary level implementation and multi level implementation) in a given code frame, just add codes where it asked to, -and please write the...

-Write the code in python for the logisitic regression (- binary level implementation and multi level implementation) in a given code frame, just add codes where it asked to, -and please write the...

Tasks The goal of the project is to complete the code for the NgramAnalyser, MarkovModel, ModelMatcher and MatcherController classes, as detailed below, and to add test code to a new JUnit test...

Summary When implementing an intervention, scientists will compare a control group against a group that receives the intervention. Various statistical analyses can help demonstrate whether the groups...

Association rule mining is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in...

Problem 2: Census Dataset In Problem 2, you ou will be using census data from 1994 to attempt to predict whether or not a person has an annual salary greater than $50,000 based on other information...

Jupiter Notebook We have covered some of the limitations of single layer neural networks in class, but they are still powerful learning systems that provide a good way to begin learning about how to...

Haluk Demirkan Project Scheduling Assignment - MS Project The following questions below are to be answered using Microsoft Project (or any other project management software package). In order to turn...

London School of Science & Technology Qualification Unit number and title BTEC Level 5 HND Diploma Business UNIT 6: Business Decision Making Student name and ID number Assessor name Al Hassan Barrie...

I have this code in java and i want it to display its output in a CSV FILE . I want each output to go on a seperate cell or seperate column , so im copying certain text from text file and wan to...

Given two vectors u and v in the plane R. Knowing that the area of the parallelogram spanned by u and v is 5. Find the area of the parallelogram spanned by 2u and 3u + v.

Test E. T. Halls theory of social distance (The Hidden Dimension, 1969) for yourself. Spend time among a group of friends or colleagues in a workplace, at a university or sports gathering, or in a...

The table below shows ticked () boxes relating to validation and verification checks. The following five descriptions have been missed out of the table: data must be eight characters long data must...

The management of Richmond State Bank has asked you to examine the interest rate risk of the bank. Management is concerned that interest rates will increase by the end of the year and wants to see...