Question: In this problem, using Python the healthcareTrain.csv and healthcareTest.csv . Youll use the Value Distance Metric (VDM) to find the distance between symbolic feature values

In this problem, using Python the healthcareTrain.csv and healthcareTest.csv . Youll use the Value Distance Metric (VDM) to find the distance

between symbolic feature values Northeast, Midwest, South, and West, and further use this information in KNN algorithm to predict pdc-80-flag. region: US Census Region (1 Northeast, 2 Midwest, 3 South, 4 West)

1. (10 points) Find all the relevant conditional probabilities for finding VDM for symbolic variable region and report your results in a table.

2. (10 points) Use results in part 1 to find the distance between symbolic feature values Northeast, Midwest, South, and West using VDM equation.

Report the distances in a table.

3. (10 points) Use this variable (region) in conjunction with the variables of problem 1 and regenerate your model, for k = 75 to 105 with a step

size of 2. Report the mean accuracy rate. Compare this mean with mean accuracy rate from previous problem. Has it increased for decreased?

RegionTrain: 0 2 1 2 2 2 3 3 4 4 5 3 6 3 7 3 8 3 9 4 10 3 11 3 12 2 13 2 14 3 15 3 16 4 17 1 18 3 19 1 

RegionTest:

0 3 1 3 2 3 3 3 4 3 5 3 6 3 7 3 8 3 9 3 10 3 11 4 12 4 13 4 14 4 15 4 16 4 17 4 18 3 19 4 

Trainpdc80:

0 0 1 1 2 0 3 0 4 1 5 1 6 1 7 1 8 0 9 0 10 0 11 1 12 0 13 1 14 1 15 1 16 1 17 1 18 0 19 0 

Testpdc80:

0 0 1 1 2 0 3 0 4 1 5 1 6 1 7 1 8 0 9 0 10 0 11 1 12 0 13 1 14 1 15 1 16 1 17 1 18 0 19 0

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Accounting Questions!