# Question

The file S02_35.xlsx contains data from a survey of 500 randomly selected (fictional) households.

a. Create a table of correlations between the last five variables (First Income to Debt). On the sheet with these correlations, enter a “cutoff” correlation such as 0.5 in a blank cell. Then use conditional formatting to color green all correlations in the table at least as large as this cutoff, but don’t color the 1s on the diagonal. The coloring should change automatically as you change the cutoff. This is always a good idea for highlighting the “large” correlations in any correlations table.

b. When you create the table of correlations, you are warned about the missing values for Second Income. Do some investigation to see how StatTools deals with missing values and correlations. There are two basic possibilities (and both of these are options in some software packages). First, it could delete all rows that have missing values for any variables and then calculate all of the correlations based on the remaining data. Second, when it creates the correlation for any pair of variables, it could (temporarily) delete only the rows that have missing data for these two variables and then calculate the correlation on what remains for these two variables. Why would you prefer the second option? How does StatTools do it?

a. Create a table of correlations between the last five variables (First Income to Debt). On the sheet with these correlations, enter a “cutoff” correlation such as 0.5 in a blank cell. Then use conditional formatting to color green all correlations in the table at least as large as this cutoff, but don’t color the 1s on the diagonal. The coloring should change automatically as you change the cutoff. This is always a good idea for highlighting the “large” correlations in any correlations table.

b. When you create the table of correlations, you are warned about the missing values for Second Income. Do some investigation to see how StatTools deals with missing values and correlations. There are two basic possibilities (and both of these are options in some software packages). First, it could delete all rows that have missing values for any variables and then calculate all of the correlations based on the remaining data. Second, when it creates the correlation for any pair of variables, it could (temporarily) delete only the rows that have missing data for these two variables and then calculate the correlation on what remains for these two variables. Why would you prefer the second option? How does StatTools do it?

## Answer to relevant Questions

We have indicated that if you have two categorical variables and you want to check whether they are related, the best method is to create a crosstabs, possibly with the counts expressed as percentages. But suppose both ...Solve problem 7 with pivot tables and create corresponding pivot charts. However, find only means and standard deviations, not medians or quartiles. The file S02_18.xlsx contains daily values of the S&P Index from 1970 to 2009. It also contains percentage changes in the index from each day to the next. Create a pivot table with average of % Change in the Values area and ...Unfortunately, StatTools doesn’t have a stacked option for its correlation procedure, which would allow you to get a separate table of correlations for each category of a categorical variable. The only alternative is to ...Recall from an example in the previous chapter that the file Supermarket Transactions.xlsx contains over 14,000 transactions made by supermarket customers over a period of approximately two years. Set up a single pivot table ...Post your question

0