Question: Please fill missing values in balance column with group mean. There are two different groups of columns in the dataset. To check whether to use
Please fill missing values in balance column with group mean. There are two different groups of columns in the dataset. To check whether to use group mean of a column with categorical data, study their group means of balance; if there are big differences between group means, the column is a good candidate. To check whether to use group mean of a column with numerical data, exam the density plots we have created, based on their patterns, select the column having similar distribution as the balance column. If a column eventually selected has numerical data, create bins first, then use bin mean to fill the missing value. Please make this decision by yourself after checking all candidates, and explain your decision.
| region | age | income | long_Month | longten | internet | balance | class |
| 2 | 44 | 64 | 3.7 | 37.45 | 0 | 2.014903021 | 1 |
| 3 | 33 | 136 | 4.4 | 42 | 0 | 2.724579503 | 1 |
| 3 | 52 | 116 | 18.15 | 1300.6 | 0 | 3.409496184 | 0 |
| 2 | 39 | 78 | 11.8 | 487.4 | 0 | 2.602689685 | 0 |
| 3 | 22 | 19 | 10.9 | 504.5 | 1 | 2.1690537 | 1 |
| 2 | 35 | 76 | 6.05 | 239.55 | 1 | 3.146305132 | 0 |
| 3 | 59 | 166 | 9.75 | 449.05 | 0 | 2.48490665 | 0 |
| 1 | 41 | 72 | 24.15 | 1659.7 | 0 | 2.803360381 | 0 |
| 2 | 33 | 125 | 4.85 | 17.25 | 1 | NaN | 1 |
| 3 | 35 | 80 | 7.1 | 47.45 | 0 | 3.16758253 | 0 |
| 1 | 38 | 37 | 8.55 | 308.7 | 0 | 3.731699451 | 0 |
| 1 | 38 | 75 | 5.1 | 146.25 | 1 | 2.420368129 | 0 |
| 3 | 57 | 162 | 16.15 | 946.9 | 0 | 3.401197382 | 0 |
| 1 | 29 | 77 | 6.7 | 140.95 | 1 | 3.188416617 | 1 |
| 3 | 30 | 16 | 3.75 | 25.65 | 1 | NaN | 1 |
| 1 | 52 | 120 | 20.7 | 1391.05 | 0 | 3.091042453 | 0 |
| 3 | 33 | 101 | 5.3 | 253.35 | 1 | 3.286534473 | 0 |
| 3 | 48 | 67 | 15.05 | 810.45 | 0 | 3.305053521 | 0 |
| 3 | 43 | 36 | 12.5 | 153.75 | 0 | 2.890371758 | 0 |
| 2 | 21 | 33 | 2.2 | 2.2 | 0 | 3.701301974 | 0 |
| 2 | 40 | 37 | 8.25 | 399.15 | 1 | 3.33220451 | 0 |
| 1 | 37 | 36 | 10.6 | 582.6 | 1 | 2.90416508 | 0 |
| 1 | 53 | 155 | 21 | 1519.2 | 1 | 3.526360525 | 0 |
| 1 | 50 | 140 | 6.5 | 247.55 | 0 | 3.555348061 | 0 |
| 1 | 27 | 55 | 4.8 | 54.1 | 1 | NaN | 0 |
| 2 | 46 | 163 | 33.9 | 1947.95 | 1 | 2.621038824 | 0 |
| 3 | 35 | 52 | 4.25 | 82.7 | 1 | NaN | 1 |
| 2 | 60 | 211 | 21.15 | 1228.7 | 1 | 3.993602992 | 0 |
| 1 | 57 | 186 | 9.8 | 428.25 | 0 | 3.583518938 | 0 |
| 1 | 41 | 39 | 6.55 | 67.8 | 0 | 2.983153491 | 1 |
| 2 | 57 | 22 | 41.75 | 3043.05 | 0 | 2.931193752 | 0 |
| 3 | 41 | 30 | 2.5 | 31.25 | 0 | 3.984343667 | 0 |
| 2 | 28 | 29 | 4.25 | 78 | 0 | 2.876385516 | 0 |
| 1 | 43 | 76 | 14.7 | 897.05 | 0 | 2.397895273 | 0 |
| 1 | 41 | 74 | 14.5 | 963.3 | 1 | 2.833213344 | 0 |
| 1 | 51 | 63 | 12.85 | 585.6 | 1 | 2.656756907 | 0 |
| 3 | 41 | 36 | 7.75 | 361 | 1 | 2.917770732 | 1 |
| 3 | 34 | 33 | 2.95 | 18.9 | 0 | NaN | 1 |
| 1 | 36 | 29 | 3.25 | 16.8 | 1 | 2.740840024 | 1 |
| 2 | 34 | 27 | 6.3 | 150.9 | 1 | NaN | 1 |
| 1 | 52 | 49 | 24.75 | 1349.05 | 0 | 3.102342009 | 0 |
| 3 | 22 | 24 | 7.8 | 63 | 1 | 3.113515309 | 0 |
| 1 | 26 | 26 | 4.85 | 33.7 | 0 | NaN | 0 |
| 1 | 27 | 47 | 6.25 | 330.4 | 0 | 2.656756907 | 0 |
| 1 | 62 | 27 | 15.5 | 967.1 | 0 | 3.540959324 | 0 |
| 2 | 52 | 30 | 10.4 | 447.85 | 0 | 2.944438979 | 0 |
| 2 | 40 | 127 | 19.7 | 909.9 | 1 | 2.691243083 | 0 |
| 2 | 39 | 137 | 12.7 | 676.9 | 0 | 2.442347035 | 1 |
| 2 | 50 | 80 | 28.8 | 1558.1 | 0 | 3.056356895 | 0 |
| 3 | 55 | 30 | 10.25 | 395.85 | 0 | 2.756840365 | 0 |
| 2 | 51 | 438 | 29 | 1815.4 | 1 | 3.208825489 | 0 |
| 2 | 39 | 79 | 15.25 | 653.7 | 1 | 3.305053521 | 1 |
| 3 | 47 | 63 | 9.05 | 373.65 | 0 | 1.909542505 | 0 |
| 1 | 67 | 51 | 57.05 | 4168.25 | 0 | 2.957511061 | 0 |
| 3 | 43 | 61 | 3.15 | 7.9 | 1 | 2.788092909 | 1 |
| 1 | 57 | 22 | 8.55 | 381.5 | 0 | 2.983153491 | 0 |
| 2 | 48 | 91 | 24.5 | 1531.9 | 0 | 2.525728644 | 0 |
| 1 | 68 | 244 | 30.25 | 2186.2 | 0 | 3.425889994 | 0 |
| 1 | 42 | 80 | 7.85 | 196.65 | 1 | 2.351375257 | 0 |
| 3 | 34 | 83 | 8.9 | 379.55 | 1 | 1.749199855 | 0 |
| 2 | 31 | 21 | 1.65 | 3.2 | 0 | NaN | 1 |
| 3 | 48 | 24 | 11.85 | 230.7 | 0 | 3.555348061 | 1 |
| 2 | 53 | 351 | 5.5 | 185.3 | 0 | 2.957511061 | 0 |
| 2 | 52 | 169 | 14.65 | 618.15 | 1 | 2.656756907 | 0 |
| 2 | 54 | 50 | 17.25 | 1067.8 | 1 | 4.471638793 | 1 |
| 2 | 35 | 161 | 3.4 | 10.35 | 1 | NaN | 1 |
| 1 | 47 | 212 | 7.45 | 320.9 | 0 | 3.208825489 | 0 |
| 3 | 61 | 53 | 12.25 | 631.7 | 1 | 2.1690537 | 0 |
| 3 | 33 | 73 | 5.85 | 216.6 | 1 | 3.102342009 | 0 |
| 3 | 20 | 17 | 3.75 | 12.3 | 0 | NaN | 1 |
| 2 | 33 | 23 | 10.2 | 196.3 | 1 | 2.674148649 | 1 |
| 3 | 36 | 107 | 11.55 | 491.55 | 0 | 3.068052935 | 1 |
| 2 | 25 | 21 | 8.65 | 426.6 | 0 | 2.197224577 | 1 |
| 3 | 58 | 83 | 19.3 | 1323.2 | 0 | 3.124565145 | 0 |
| 1 | 20 | 17 | 3.05 | 40.3 | 1 | 2.63905733 | 0 |
| 3 | 25 | 76 | 24.05 | 1536.55 | 0 | 3.056356895 | 0 |
| 2 | 24 | 19 | 4 | 46 | 0 | 3.305053521 | 1 |
| 1 | 61 | 41 | 9.6 | 353.55 | 0 | 2.251291799 | 0 |
| 3 | 39 | 105 | 6.6 | 159.7 | 0 | 3.617651945 | 0 |
| 1 | 54 | 31 | 5.85 | 97 | 1 | NaN | 0 |
| 2 | 40 | 41 | 25.25 | 1725.1 | 0 | 3.496507561 | 0 |
| 3 | 50 | 102 | 12.6 | 760.3 | 1 | 2.525728644 | 0 |
| 2 | 22 | 25 | 12.05 | 666 | 1 | 1.871802177 | 0 |
| 1 | 42 | 68 | 17.3 | 997.85 | 0 | 2.957511061 | 0 |
| 1 | 55 | 79 | 13.8 | 668.65 | 0 | 3.238678452 | 0 |
| 3 | 31 | 28 | 3.45 | 13.3 | 1 | 3.157000421 | 1 |
| 2 | 48 | 64 | 14.8 | 708.45 | 0 | 2.833213344 | 0 |
| 2 | 36 | 38 | 10.15 | 538.65 | 0 | 2.876385516 | 0 |
| 3 | 23 | 37 | 5.35 | 144.05 | 1 | 2.197224577 | 0 |
| 2 | 64 | 98 | 26.15 | 1805.1 | 0 | 2.420368129 | 0 |
| 1 | 52 | 195 | 23.15 | 1420.95 | 0 | 3.583518938 | 1 |
| 2 | 35 | 47 | 5.2 | 46.9 | 0 | NaN | 1 |
| 1 | 47 | 65 | 12.4 | 744.45 | 0 | 3.597312261 | 0 |
| 1 | 50 | 150 | 19.3 | 930.8 | 1 | 3.926911618 | 1 |
| 2 | 39 | 106 | 9.05 | 568.15 | 0 | 2.505525937 | 0 |
| 2 | 43 | 33 | 4.25 | 30 | 0 | 2.788092909 | 0 |
| 3 | 25 | 38 | 13.9 | 255 | 0 | 1.32175584 | 0 |
| 3 | 32 | 125 | 3.8 | 31.75 | 1 | 2.944438979 | 1 |
| 1 | 37 | 145 | 8.6 | 241.45 | 1 | 2.876385516 | 0 |
| 1 | 44 | 99 | 22.05 | 841.55 | 0 | 2.277267285 | 0 |
| 1 | 34 | 22 | 4.85 | 75.45 | 1 | 3.548179572 | 0 |
| 2 | 60 | 31 | 32.25 | 2349.25 | 0 | 3.511545439 | 0 |
| 1 | 36 | 25 | 1.9 | 3.6 | 0 | 1.609437912 | 0 |
| 3 | 25 | 57 | 20.25 | 923.95 | 1 | 2.277267285 | 0 |
| 2 | 51 | 41 | 12.1 | 492.8 | 1 | 2.277267285 | 0 |
| 3 | 25 | 20 | 8.15 | 209.35 | 1 | NaN | 1 |
| 3 | 43 | 101 | 13.65 | 817.65 | 0 | 3.032546247 | 0 |
| 1 | 37 | 56 | 10.25 | 681.95 | 0 | 2.374905755 | 0 |
| 2 | 27 | 22 | 17.2 | 893.3 | 0 | 2.583997552 | 0 |
| 1 | 37 | 108 | 21.8 | 1292 | 1 | 2.079441542 | 0 |
| 3 | 48 | 115 | 12.8 | 785.7 | 1 | 2.847812143 | 0 |
| 3 | 54 | 53 | 8.1 | 225.4 | 1 | 4.034240638 | 0 |
| 3 | 53 | 242 | 8.95 | 396.35 | 0 | 3.349904087 | 0 |
| 1 | 24 | 20 | 3.35 | 7.55 | 1 | 2.224623552 | 0 |
| 2 | 47 | 123 | 2.5 | 9.25 | 1 | 2.621038824 | 0 |
| 3 | 44 | 47 | 7.05 | 182.3 | 0 | 2.674148649 | 0 |
| 3 | 37 | 48 | 12.9 | 669.75 | 0 | 2.1690537 | 0 |
| 2 | 64 | 13 | 14.6 | 921.7 | 0 | 2.756840365 | 0 |
| 2 | 52 | 78 | 18.6 | 1179.05 | 0 | 0 | |
| 3 | 26 | 34 | 6.5 | 198.7 | 0 | 2.772588722 | 0 |
| 1 | 46 | 81 | 17.45 | 1067 | 1 | 2.327277706 | 0 |
| 1 | 45 | 86 | 45.05 | 3235.9 | 1 | 3.555348061 | 0 |
| 3 | 27 | 26 | 12.2 | 477.5 | 1 | 2.224623552 | 1 |
| 3 | 34 | 51 | 4.7 | 61.5 | 0 | 2.803360381 | 0 |
| 2 | 56 | 57 | 6.2 | 135.2 | 1 | 2.351375257 | 0 |
| 1 | 48 | 41 | 4.35 | 59.15 | 0 | 2.079441542 | 0 |
| 2 | 46 | 168 | 14.2 | 928.8 | 1 | 3.657130756 | 0 |
| 2 | 33 | 29 | 4.65 | 35.45 | 1 | NaN | 1 |
| 3 | 58 | 167 | 24.3 | 1642.35 | 0 | 2.847812143 | 0 |
| 3 | 24 | 29 | 4.45 | 43.7 | 1 | NaN | 1 |
| 1 | 55 | 104 | 15 | 960.95 | 0 | 3.669951444 | 0 |
| 3 | 22 | 46 | 9.35 | 311.85 | 1 | 1.791759469 | 1 |
| 3 | 26 | 46 | 11 | 350.05 | 1 | 1.791759469 | 0 |
| 1 | 33 | 51 | 2.9 | 12.05 | 0 | NaN | 1 |
| 2 | 42 | 65 | 14.2 | 1007.5 | 1 | 3.449987546 | 0 |
| 2 | 33 | 48 | 3.15 | 39.75 | 0 | 3.417726684 | 0 |
| 3 | 34 | 46 | 7.1 | 125.05 | 1 | NaN | 0 |
| 2 | 63 | 43 | 14.85 | 808.95 | 1 | 2.397895273 | 0 |
| 3 | 53 | 110 | 17.45 | 1249.8 | 0 | 2.691243083 | 0 |
| 1 | 39 | 26 | 14.45 | 349.9 | 1 | 3.135494216 | 1 |
| 3 | 28 | 34 | 15 | 934.05 | 0 | 2.862200881 | 0 |
| 2 | 54 | 21 | 10.6 | 565.55 | 0 | 2.833213344 | 0 |
| 3 | 33 | 38 | 5.05 | 48.6 | 1 | 2.525728644 | 0 |
| 1 | 21 | 19 | 7.9 | 98.35 | 1 | 1.609437912 | 1 |
| 1 | 39 | 40 | 36.25 | 2553.7 | 0 | 3.650658241 | 0 |
| 3 | 69 | 58 | 6.35 | 118.05 | 1 | 2.463853241 | 0 |
| 2 | 65 | 128 | 21.2 | 1325.05 | 0 | 2.674148649 | 0 |
| 3 | 66 | 460 | 89.4 | 6353.9 | 1 | 2.931193752 | 0 |
region, category, and class are categorical data and the rest are numerical data
I am not too sure how to approach this question. Can someone explain this in detail.
Needs to be done using python
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
