Sometimes it is possible that missing data are predictive in the sense that rows with missing data are somehow different from rows without missing data. Check this with the file S02_32.xlsx, which contains blood pressures for 1000 (fictional) people, along with variables that can be related to blood pressure. These other variables have a number of missing values, presumably because the people didn’t want to report certain information.
a. For each of these other variables, find the mean and standard deviation of blood pressure for all people without missing values and for all people with missing values. Can you conclude that the presence or absence of data for any of these other variables has anything to do with blood pressure?
b. Some analysts suggest filling in missing data for a variable with the mean of the non-missing values for that variable. Do this for the missing data in the blood pressure data. In general, do you think this is a valid way of filling in missing data? Why or why not?

  • CreatedApril 01, 2015
  • Files Included
Post your question