Question: In this problem, using Python the healthcareTrain.csv and healthcareTest.csv . Youll use the Value Distance Metric (VDM) to find the distance between symbolic feature values
In this problem, using Python the healthcareTrain.csv and healthcareTest.csv . Youll use the Value Distance Metric (VDM) to find the distance
between symbolic feature values Northeast, Midwest, South, and West, and further use this information in KNN algorithm to predict pdc-80-flag. region: US Census Region (1 Northeast, 2 Midwest, 3 South, 4 West)
1. (10 points) Find all the relevant conditional probabilities for finding VDM for symbolic variable region and report your results in a table.
2. (10 points) Use results in part 1 to find the distance between symbolic feature values Northeast, Midwest, South, and West using VDM equation.
Report the distances in a table.
3. (10 points) Use this variable (region) in conjunction with the variables of problem 1 and regenerate your model, for k = 75 to 105 with a step
size of 2. Report the mean accuracy rate. Compare this mean with mean accuracy rate from previous problem. Has it increased for decreased?
RegionTrain: 0 2 1 2 2 2 3 3 4 4 5 3 6 3 7 3 8 3 9 4 10 3 11 3 12 2 13 2 14 3 15 3 16 4 17 1 18 3 19 1
RegionTest:
0 3 1 3 2 3 3 3 4 3 5 3 6 3 7 3 8 3 9 3 10 3 11 4 12 4 13 4 14 4 15 4 16 4 17 4 18 3 19 4
Trainpdc80:
0 0 1 1 2 0 3 0 4 1 5 1 6 1 7 1 8 0 9 0 10 0 11 1 12 0 13 1 14 1 15 1 16 1 17 1 18 0 19 0
Testpdc80:
0 0 1 1 2 0 3 0 4 1 5 1 6 1 7 1 8 0 9 0 10 0 11 1 12 0 13 1 14 1 15 1 16 1 17 1 18 0 19 0
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
