Question: Do the following tasks ( in exact sequence ) using the HW 4 _ DataA data: B - 1 . / 5 marks /
Do the following tasks in exact sequence using the HWDataA" data: B marks: Read and display the data given in HWDataA. Describe both the numeric and categorical attributes. Refer to Table for the data description. B marks: each: Do the necessary preprocessing. In specific do the following: a Normalize the numeric attributes using minmax normalization scheme. D Perform ordinal label encoding for the ordinal attribute education level Use dictionary for the ordinal encoding. The order is as follows starting from the lowest: High School, Associate's, Bachelor's, Master's, Doctoral; C Perform one hot encoding for the categorical attributes gender and marital ststus d For occupation feature, encode student to and all other choices to do not forget to convert the type to integer e Perorfm label encoding for the class loan status B marks: each: a Split the dataset into training and testing sets using traintest split function with for training and for training using random state b Build a decision tree classifier for predicting the class label. Fit the classifier using the training dataset. Set random state to criterion to entropy, and splitter to best. C Draw the decision tree using scikitlearn sklearn d Test the classifier on the testing data set, and print the confusion matrix and classification metrics Accuracy sensitivity Recall Precision of the decision tree classifier. B marks: each: Using the same dataset split in Ba a Build a Random Forest classifier for predicting the class label with & trees. Fit the classifier using the training set. Set criterion to entropy and random state to b Draw the trees using scikit learn sklearn C Test the classifier on the testing data set, and print the confusion matrix and classification metrics Accuracy sensitivity Recall Precision of the Random forest classifier. B marks: Calculate the Information Gain IG for the class variable "loanstatus" given the feature "education level" as a root node. B marks: From the decision tree built in B write a classification rule using the normalized values first then return it to the original values. B marks: Write two association rules for gender education level", which rule has the highest accuracy? Write the corresponding support and accuracy. B marks: Repeat parts b c and d in B using the Nave Bayes GaussianNB classifier. B marks: Compare the performance of the Nave Bayes against the built decision tree and random forest classifiers using confusion matrix. Based on the comparison, which one is the best to use with the given data set? PLEASE solve all the parts
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
