We have indicated that if you have two categorical variables and you want to check whether they are related, the best method is to create a crosstabs, possibly with the counts expressed as percentages. But suppose both categorical variables have only two categories and these variables are coded as dummy 0–1 variables. Then there is nothing to prevent you from finding the correlation between them with the same Equation (3.2) from this section. However, if we let C(i,j) be the count of observations where the first variable has value i and the second variable has value j, there are only four joint counts that can have any bearing on the relationship between the two variables: C(0,0), C(0,1), C(1,0), and C(1,1). Let C1(1) be the count of 1s for the first variable and let C2(1) be the count of 1s for the second variable. Then it is clear that C1(1) = C(1,0) + C(1,1) and C2(1) = C(0,1) + C(1,1), so C1(1) and C2(1) are determined by the joint counts. It can be shown algebraically that the correlation between the two 0–1 variables is

To illustrate this, the file S03_32.xlsx contains two 0–1 variables. Create a crosstabs to find the required counts, and use the above formula to calculate the correlation. Then use StatTools to find the correlation in the usual way. Do your two results match?

  • CreatedApril 01, 2015
  • Files Included
Post your question