All Matches
Solution Library
Expert Answer
Textbooks
Search Textbook questions, tutors and Books
Oops, something went wrong!
Change your search query and then try again
Toggle navigation
FREE Trial
S
Books
FREE
Tutors
Study Help
Expert Questions
Accounting
General Management
Mathematics
Finance
Organizational Behaviour
Law
Physics
Operating System
Management Leadership
Sociology
Programming
Marketing
Database
Computer Network
Economics
Textbooks Solutions
Accounting
Managerial Accounting
Management Leadership
Cost Accounting
Statistics
Business Law
Corporate Finance
Finance
Economics
Auditing
Ask a Question
Search
Search
Sign In
Register
study help
business
measurement theory in action
Questions and Answers of
Measurement Theory In Action
What are the major purposes of the different forms of intra-individual differences in interpreting test data?
Can you provide examples of the uses of both inter-individual differences and intra-individual differences?
Are descriptive statistics or inferential statistics used more in applied psychological measurement?
Why are we more likely to use estimation rather than statistical significance testing in applied psychological measurement?
How do descriptive statistics and standardized scores allow us to interpret a set of test scores? Why?
What are the advantages of using a scatterplot in addition to the Pearson product moment correlation?
What does a 95% confidence interval of the mean tell us? How about a 99% confidence interval for an individual score?
What is the difference between scaling and classification?
What is the difference between psychometrics and psychological scaling?
Why do you think it is so difficult to scale more than one dimension (i.e., people, stimuli, and responses) at once?
Why is it important to know the level of measurement of our data before we begin the scaling process?
How would we scale multiple dimensions at one time?
Why do measures that claim to assess the same construct sometimes appear so vastly different from one another?
To what extent is it likely that two different measures of the same construct that employ the use of distinct item formats will provide similar results?
What practical constraints often play a large role in the determination of test specifications?
If you were assigned to develop a new measure for a personality construct, what sources might you seek to better inform yourself about the construct prior to defining the construct?
When developing a measure intended to assess a facet of personality, when would you develop items that use frame-of-reference tags? When would you develop items that assess behavior across contexts?
Why is reliability important in psychological and educational testing?
In your own words, explain the concept of a true score.
What are some of the major assumptions of classical test theory?
What are some of the limitations of classical test theory?
How is reliability defined in terms of classical test theory?
Under what conditions might we want a very high reliability coefficient?
Under what conditions might we accept a low reliability coefficient for a psychological measure?
Sheila was frustrated. Although she was happy with both the topic and the constructs she had chosen to examine in her senior honors thesis, she had hit several roadblocks in determining what measures
Why are longer tests generally more reliable than shorter tests? What conditions must be met for this to be true?
Validity is a unified construct. In what ways do the various approaches to examining content validity provide validity evidence?
Would it be more appropriate to adopt a content validation approach to examine a final exam in a personality psychology course or to examine a measure of conscientiousness? Explain.
The content approach to test validation has historically relied heavily on expert judgment, though newer approaches argue that any competent individual might be capable of providing validity evidence
Would content validity alone provide sufficient evidence for validity for(a) An employment exam?(b) An extraversion inventory?(c) A test to determine the need for major surgery? In each case,
Does face validity establish content validity? Explain your answer.
What could a student do if he or she thought a classroom exam was not content valid?
What could an instructor do if a student asserted that a classroom exam was not content valid?
Consider a test or inventory of your choosing. If you wanted to examine the content validity of this measure, how would you go about choosing raters to provide judgments regarding the content
Imagine the case in which 14 SMEs were asked to provide CVR ratings for a five-item test. Compute the CVR for each of the items based on the ratings shown in Table 7.1.Table 7.1 Item 12345 Not
Is quantifying content validity through the use of the CVI, CVF, or other similar method necessary to establishing content validity? Explain.
Given that 14 SMEs were used to provide the ratings in question 10, which items do you feel have received a CVR so low that you would recommend deleting the item? Justify your response.
How does the criterion-related approach to test validation help provide evidence of the accuracy of the conclusions and inferences drawn from test scores?
What are the differences among predictive, concurrent, and postdictive criterion-related validation designs?
What concerns might you have in using a concurrent or postdictive criterion-related validation design?
The various criterion-related validity research designs might not be equally appropriate for a given situation. For each of the following criterion-related validity designs, provide an example
What factors would you consider to ensure that you have an appropriate criterion?
What factors might attenuate an observed correlation between test scores and criterion scores? Explain.
What might inflate an observed correlation between test scores and criterion scores? Explain.
For each of the following, explain how the correction formula provides a more accurate estimate of the true relationship between the predictor and the criterion:a. Correction for unreliability in
Although it is empirically possible to correct for attenuation due to unreliability in a predictor, this is a violation of ethics if we intend to use the predictor for applied purposes. Explain why
If conducting a correction for restriction in range of the predictor variable in a concurrent criterion-related validity study, who is the population referring to? How might you best estimate the
How could a small organization determine which selection tests might be appropriate for use in selection of new employees?
The unified view of test validation regards all aspects of validation as reaching for the same goal. What is the overall goal of test validation?
Explain why a thorough understanding of the construct measured is essential to the validation process.
What did Cronbach and Meehl (1955) mean by the term “nomological network”?
Can reliability estimates be used to provide evidence of the construct validity of test scores? Explain.
Explain how a researcher could conduct a “study of process” to provide evidence of the construct validity of test scores.
(a) Identify two established measures that could be used (other than those discussed previously) to examine the convergent validity of the Affective Empathy scale discussed above. (b) Identify two
Why is common method variance (CMV) a concern in construct validation studies that involve correlation matrices?
Correlations between what elements of an MTMM matrix would provide the best assessment of CMV?
How does use of an MTMM matrix provide evidence of the construct validity of test scores?
Messick (1995a) identified six aspects of construct validation. Choose any three of these aspects to discuss how Messick’s conceptualization has extended your awareness of the meaning of construct
Most papers and books on meta-analysis say one should include both published and unpublished studies on a given topic. How does one go about getting unpublished studies?
Based on the data in Figure 11.1, what would have happened if we had used a common regression line to predict suicide risk in all three age groups?Figure 11.1 Suicide Risk Old Young Middle Age Test
Can a single person conduct a meta-analysis or does it take a team of researchers? Why?
Assuming we did use the same regression line for all three groups, which group would be most likely to raise claims of test bias? Unfairness?
There are several options with regard to which analytical approach to use. How do you decide which one to use?
How do you decide which moderators to examine?
How does one go about narrowing down the seemingly endless list of potential “omitted variables” in moderated regression analysis used to determine test bias?
Why do you think that intercept bias is much more common than slope bias?
What other factors (besides a truly biased test or an omitted variable) might be falsely suggesting test bias when, in fact, the test is not biased?
Which stakeholders in the testing process are responsible for determining whether test bias actually exists or not?
Can a test that is determined to be biased still be a fair test? Alternatively, can a test that is determined to be unfair still be an unbiased test? Describe the process of back translation.
Why is back translation insufficient to guarantee equivalence?
Provide an example of each of the four types of test equivalence identified by Lonner (1990).
If you had recently translated a test into a different cultural context, how would you assess each of the four types of equivalence?
What factors should be considered when determining whether a requested test accommodation is reasonable?
Why is test-wiseness a problem in tests of maximal performance?
What do you think of intentionally incorporating test-wise characteristics into item distracters? Defend your position.
What are the advantages and disadvantages of selected-response items?
What are the advantages and disadvantages of free-response items?
Why shouldn’t use of “all of the above” be included in multiple-choice response options?
Why shouldn’t test takers be given a choice among several different essay items?
Why are multiple short-answer items preferable to one long essay question?
Why is pretesting of items important in test construction?
In what ways do Anderson and Krathwohl (2001) revision differ from Bloom’s original taxonomy?
Who would be appropriate to fulfill the role of SME for a test designed to assess knowledge of: a. 12th-grade mathematics? b. Modern automotive repair? c. American pop culture?
What is the difference between an item difficulty index and an item discrimination index?
How do you know whether to calculate the discrimination index (which contrasts extreme groups), the biserial correlation, or the point biserial correlation coefficient as your item discrimination
How do you decide which external criterion to use when computing an item-criterion index?
What corrections, if any, might you make to items 1, 2, 4, 5, and 8 in Table 13.2?Table 13.2 Seq. No. 1 2 3 4 10 6 7 Item Statistics 8 3 1 0-13 0-48 39.74 72.48 8.51 0.27 .14 -.12 Check the key A was
Is there ever a time when a .25 p value is good? How about a 1.00 p value?
Will your criteria for evaluating your item difficulty and discrimination indexes change if a test is norm referenced versus criterion referenced?
Will your criteria for evaluating your item difficulty and discrimination indexes change as the format of the item changes (e.g., true-false; three-, four-, or five-option multiple choice; Likert
Oftentimes in a classroom environment, you might have more students (subjects) than you have items. Does this pose a problem for interpreting your item analysis statistics?
How do we best define the “minimally competent person” when using judgmental methods such as the Angoff, Nedelsky, Ebel, and Bookmark methods?
When does a method for setting pass points go from being judgmental/ empirical to empirical/judgmental? Does it really matter?
What legal issues do we need to be concerned with when setting cutoff scores?
Does where we set the cutoff score affect the validity of the test? The utility?
How do we know whether we should minimize false-positive or false negative decisions? Will that decision impact the procedure we use to make the cutoff score decision?
Do we really even need to set cutoff scores? Why not just rank order all the test scores from highest to lowest and provide the valued outcome until it runs out?
What if we set a cutoff score and no one passes?
This module begins by discussing serious concerns with self-report measures. Do such concerns indicate we should abandon this type of inquiry? Explain.
Given the concerns in #1 above, do you think we should clearly provide respondents an option to respond “don’t know”? Explain.Question 1:This module begins by discussing serious concerns with
Showing 100 - 200
of 260
1
2
3