Question: Do the following tasks ( in exact sequence ) using the HW 4 _ DataA data: B - 1 . / 5 marks /

Do the following tasks

(

in exact sequence

)

using the

"

4_

DataA" data: B

- 1 . / 5

marks

/

: Read and display the data given in HW

4_

DataA. Describe both the numeric and categorical attributes. Refer to Table

8.5.2

for the data description. B

- 2 . / 12.5

marks:

2.5

each

/

: Do the necessary pre

-

processing. In specific do the following: a

.

Normalize the numeric attributes using min

-

max normalization scheme. D

.

Perform ordinal

(

label

)

encoding for the ordinal attribute

(

education

_

level

) .

Use dictionary for the ordinal encoding. The order is as follows starting from the lowest:

{

High School, Associate's, Bachelor's, Master's, Doctoral; C

.

Perform one hot encoding for the categorical attributes

(

gender

,

and marital ststus

)

.

For occupation feature, encode student to

0

and all other choices to

1 (

do not forget to convert the type to integer

) .

.

Perorfm label encoding for the class

(

loan

_

status

) .

- 3 . / 10

marks:

2.5

each

/

: a

.

Split the dataset into training and testing sets using train

_

test

_

split function with

75 %

for training and

25 %

for training using random state

= 10 .

.

Build a decision tree classifier for predicting the class label. Fit the classifier using the training dataset. Set random state to

100,

criterion to entropy, and splitter to best. C

.

Draw the decision tree using scikit

-

learn

(

sklearn

)

.

Test the classifier on the testing data set, and print the confusion matrix and classification metrics

(

Accuracy

,

sensitivity

(

Recall

),

Precision

)

of the decision tree classifier. B

- 4 . 17.5

marks:

2.5

each

/

: Using the same dataset split in B

- 3 .

a a

.

Build a Random Forest classifier for predicting the class label with & trees. Fit the classifier using the training set. Set criterion to entropy and random

_

state to

62 .

.

Draw the trees using sci

-

kit learn

(

sklearn

)

.

Test the classifier on the testing data set, and print the confusion matrix and classification metrics

(

Accuracy

,

sensitivity

(

Recall

),

Precision

)

of the Random forest classifier. B

- 5 . / 10

marks: Calculate the Information Gain

(

)

for the class variable "loan

_

status" given the feature "education

_

level" as a root node. B

- 6 . / 10

marks: From the decision tree built in B

- 3,

write a classification rule using the normalized values first then return it to the original values. B

- 7 . / 10

marks

)

: Write two association rules for

"

gender

- >

education level", which rule has the highest accuracy? Write the corresponding support and accuracy. B

- 8 . / 10

marks

/

: Repeat parts b

,

,

and d in B

- 3

using the Na

ve Bayes GaussianNB classifier. B

- 9 . / 5

marks

/

: Compare the performance of the Na

ve Bayes against the built decision tree and random forest classifiers using confusion matrix. Based on the comparison, which one is the best to use with the given data set? PLEASE solve all the parts

Do the following tasks ( in exact sequence )

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Do the following tasks (in exact sequence) using data given in HW6DataB and Table-2: B-1: Entropy. What do we measure from Entropy? What does it mean to say that the Entropy is 0? What is the use of...

A - 3 . [ 1 0 marks: 2 . 5 each ] : a . Split the dataset into training and testing sets using train _ test _ split function with 7 5 % for training and 2 5 % for training using random state = 1 0 ....

ISE - 2 9 1 : Homework 0 4 Problem A [ 1 0 0 Marks ] : Solve all the questions using Python. Use Pandas, Seaborn, Sklearn, etc., libraries for all the analysis. Consider the data given in Excel file...

ISE - 2 9 1 : Homework 0 4 Page 2 of 9 Problem A [ 1 0 0 Marks ] : Solve all the questions using Python. Use Pandas, Seaborn, Sklearn, etc., libraries for all the analysis. Consider the data given in...

I would like assistance with assignment 3 and 4 on the attached document I have been struggling with the subject and its my last AUI4863/102/0/2016 Tutorial letter 102/0/2016 ADVANCED INTERNAL AUDIT...

Operating System Functions describe the use of an inverted page table for the implementation of virtual addressing and a paging virtual memory system. Give details of the operation of the page table...

Graphics Describe a quad-tree encoding method for greyscale images. [6 marks] Given the following greyscale image, draw a diagram showing how it would be encoded using your method from the previous...

(a) In SystemVerilog, what is the difference between: (i) The ternary operator ? and if...then...else statements? [2 marks] (ii) always_ff and always_comb? [2 marks] (iii) Blocking, non-blocking and...

can i get a solution to this question please. Value: Document (include your file name) 15% of final grade How to complete: 1. Read the following Anniversary Party scheduling problem: As the project...

Suppose the price of a security changes from period to period in such a manner that the price during period i is the price during period i-1 multiplied either by u = 1.1 or by d = 1/u, i > 1. Suppose...

What is the difference between a merger and an acquisition? A hostile takeover and a friendly takeover? A strategic alliance and a merger? How do these terms apply in the potash case?

is a passive bond portfolio management strategy that attempts to insulate the porffolio from interest rate risk.

Business advertising receives a certain amount of First Amendment protection from government regulation, but it is not absolute. Explain what limitations government regulation may place on business...