The file S02_35.xlsx contains data from a survey of 500 randomly selected (fictional) households.
a. Create a table of correlations between the last five variables (First Income to Debt). On the sheet with these correlations, enter a “cutoff” correlation such as 0.5 in a blank cell. Then use conditional formatting to color green all correlations in the table at least as large as this cutoff, but don’t color the 1s on the diagonal. The coloring should change automatically as you change the cutoff. This is always a good idea for highlighting the “large” correlations in any correlations table.
b. When you create the table of correlations, you are warned about the missing values for Second Income. Do some investigation to see how StatTools deals with missing values and correlations. There are two basic possibilities (and both of these are options in some software packages). First, it could delete all rows that have missing values for any variables and then calculate all of the correlations based on the remaining data. Second, when it creates the correlation for any pair of variables, it could (temporarily) delete only the rows that have missing data for these two variables and then calculate the correlation on what remains for these two variables. Why would you prefer the second option? How does StatTools do it?