Question: Problems 4: 30 points You will use the data frame, patient, for this problem. > patient ID GLUC TGL HDL LDL HRT MAMM SMOKE 1
Problems 4: 30 points
You will use the data frame, patient, for this problem.
> patient ID GLUC TGL HDL LDL HRT MAMM SMOKE 1 A 88 NA 32 99 Y
This problem concerns missing data imputation. You will use the same data frame, patient, for this problem. In the patient data frame, there are many missing values for both numeric and character variables.
Write a function named impute, which is used to replace the missing value with some meaningful values. For a numeric variable of the given data frame, you will replace the missing value(s) with the median value of the numeric variable. For a character variable, you will replace the missing value(s) with the value of the highest frequency of the character variable.
The impute function takes two arguments:
dat : the name of the data frame.
varlist : a character vector that contains the names of the variables you want to impute. Set the default value to NULL. If you don't provide the variable names, which is the NULL value, the function will impute all the variables of the data frame.
For example, here are some sample results that are based on the impute function:
> impute (dat=patient)
ID GLUC TGL HDL LDL HRT MAMM SMOKE 1 A 88 180 32.0 99 Y yes ever 2 B 90 150 60.0 165 Y no never
3 C 110 180 62.5 120 N yes never 4 D 90 200 65.0 165 Y yes never 5 E 90 210 62.5 150 Y yes never 6 F 88 180 32.0 210 Y yes ever 7 G 120 164 62.5 165 Y yes never 8 H 110 170 70.0 188 Y yes ever 9 I 90 190 62.5 190 N no never 10 J 90 180 75.0 165 Y yes never
> impute (dat=patient, varlist=c("ID", "GLUC", "TGL", "HDL")) ID GLUC TGL HDL 1 A 88 180 32.0 2 B 90 150 60.0 3 C 110 180 62.5 4 D 90 200 65.0 5 E 90 210 62.5 6 F 88 180 32.0 7 G 120 164 62.5 8 H 110 170 70.0 9 I 90 190 62.5 10 J 90 180 75.0
> impute (dat=patient, varlist=c("LDL", "HRT", "MAMM")) LDL HRT MAMM 1 99 Y yes 2 165 Y no 3 120 N yes 4 165 Y yes 5 150 Y yes 6 210 Y yes 7 165 Y yes 8 188 Y yes 9 190 N no 10 165 Y yes
> impute (dat=patient, varlist="HRT") HRT 1 Y 2 Y 3 N 4 Y 5 Y 6 Y 7 Y 8 Y 9 N 10 Y
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
