# Question

We have indicated that if you have two categorical variables and you want to check whether they are related, the best method is to create a crosstabs, possibly with the counts expressed as percentages. But suppose both categorical variables have only two categories and these variables are coded as dummy 0–1 variables. Then there is nothing to prevent you from finding the correlation between them with the same Equation (3.2) from this section. However, if we let C(i,j) be the count of observations where the first variable has value i and the second variable has value j, there are only four joint counts that can have any bearing on the relationship between the two variables: C(0,0), C(0,1), C(1,0), and C(1,1). Let C1(1) be the count of 1s for the first variable and let C2(1) be the count of 1s for the second variable. Then it is clear that C1(1) = C(1,0) + C(1,1) and C2(1) = C(0,1) + C(1,1), so C1(1) and C2(1) are determined by the joint counts. It can be shown algebraically that the correlation between the two 0–1 variables is

To illustrate this, the file S03_32.xlsx contains two 0–1 variables. Create a crosstabs to find the required counts, and use the above formula to calculate the correlation. Then use StatTools to find the correlation in the usual way. Do your two results match?

To illustrate this, the file S03_32.xlsx contains two 0–1 variables. Create a crosstabs to find the required counts, and use the above formula to calculate the correlation. Then use StatTools to find the correlation in the usual way. Do your two results match?

## Answer to relevant Questions

Solve problem 1 with pivot tables and create corresponding pivot charts. Express the counts as percentage of row. What do these percentages indicate about this particular data set? Then repeat, expressing the counts as ...Solve problem 8 with pivot tables and create corresponding pivot charts. However, find only means and standard deviations, not medians. Using the Elecmart Sales file from this section, experiment with slicers as follows.a. Create a pivot table that shows the average of TotalCost, broken down by Region in the row area and Time in the column area. Then insert ...The file S03_53.xlsx lists campaign contributions, by number of contributors and contribution amount, by state (including Washington DC) for the four leading contenders in the 2008 presidential race. Create a scatter plot ...The file S03_15.xlsx contains monthly data on the various components of the Consumer Price Index.a. Create differences for each of the variables. You can do this quickly with StatTools, using theDifference item in the Data ...Post your question

0