Module 11 discussed the idea of measurement equivalence in the context of cross-cultural research. CFA work has

Question:

Module 11 discussed the idea of measurement equivalence in the context of cross-cultural research. CFA work has been very important in determining the measurement equivalence of instruments across cultures. Nye, Roberts, Saucier, and Zhou (2008) used CFA methods to test the equivalence of the Goldberg Adjective Checklist measure of the Big Five Personality traits across Chinese, Greek, and American respondents. It is useful to consider their research to see how CFA methods can be used to determine whether a test structure holds up across various samples. As mentioned before, this equivalence is necessary to be able to use the instrument across cultures and to be able to compare results from one language to findings of another language. In their research, Nye et al. used the software program LISREL to estimate a factor structure for the instrument in each of the three language samples, using a process called multi-group confirmatory factor analyses. As the first step in their analysis, they estimated a CFA model on one American sample (they had two separate American samples for cross-validation purposes) for each of the personality traits (i.e., 5 separate models). A single-factor model was rejected for each of the five personality traits, suggesting that more complex models were needed. Next, they used exploratory factor analysis with an oblique rotation to get ideas for more complex models. Note that this is the reverse of the typical process where an exploratory model is used prior to a confirmatory approach; the interplay between exploratory and confirmatory methods is complex and, as this example shows, can go in a variety of directions. Their EFAs identified two factors for each dimension, with the factors largely being defined as a negative factor (i.e., words written to tap the negative end of the trait continuum) and a positive factor. Next a CFA was fit to each scale that represented the two factor solution identified in the EFAs. Each of these 2-factor models (except Neuroticism which fit satisfactorily the first time) was tweaked based on modification indexes. For some scales, there needed to be cross loadings for a few items (i.e., even though the item loaded primarily on one factor, there was still a significant loading on the other factor). For other cases, item error terms needed to be correlated for a few items. For example, the items intellectual and unintellectual loaded on separate factors (positive and negative respectively) on the Intellect scale but the modification index suggested that the error terms for the two items should be correlated. 

In the next stage, they tested three types of measurement equivalence across the two American samples. This test across two American samples allows the researchers to determine whether the factor structure identified in one structure fits well for the other sample, thus ruling out capitalizing on chance. They tested configural equivalence first, which tests whether the both samples have the same number of factors and the same pattern of loadings across the two samples. Their analysis showed that there were no differences for any of the scales in terms of number of factors and the direction of loadings. Next they tested metric equivalence and scalar equivalence, which tests whether the factor loading estimates are equal across samples and then whether factor means are equivalent across samples. Their analyses demonstrated that there was equivalence in all of these steps, thus suggesting that the CFA models that they developed for each of the five scales was robust and fit the American samples well. Next they tested the equivalence of the American model with the Greek and Chinese samples respectively (note, they did not test the equivalence of the Greek sample directly to the Chinese sample). Across all comparisons, configural equivalence held suggesting that the factor structure was the same for these comparisons. The other analyses, scalar and metric, however found that there were significant differences across samples. There were factor mean differences across samples and there were differences in the magnitudes of factor loadings across language translations. If these differences in loadings were not accounted for, misleading conclusions could have been reached in terms of understanding mean differences across samples. The Nye et al. (2008) article follows the traditional approach to testing the equivalence of scales across languages and cultures. They do a good job of detailing each step of their analysis and so it is easy to follow and is worth reading in its entirety. At the end of their model-fitting analyses, they find differences between their baseline American sample and their Greek and Chinese samples. These statistical differences raise some interesting questions that we ask you to pursue in the following questions. 

Questions 

1. What is the role of exploratory factor analysis with confirmatory factor analysis? 

2. Which of the these three types of equivalence, configural, metric, and scalar, are more important? 

3. How does the CFA approach to testing equivalence compare to other methods of testing cross-cultural equivalence mentioned in Module 11? 

4. How do you untangle whether differences are due to poor translations versus true cross-cultural differences? 

5. How can you make sure that your results estimated in one sample will generalize to other samples? 

6. How can you test generalizability across cultures if you have small samples for at least one culture?

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Related Book For  answer-question

Measurement Theory In Action

ISBN: 9780367192181

3rd Edition

Authors: Kenneth S Shultz, David Whitney, Michael J Zickar

Question Posted: