The following exercise is from Introduction to Regression Modeling and refers to data taken from Higgins and Koch’s, “ Variable Selection and Generalized Chi- Square Analysis of Cat-egorical Data Applied to a Large Cross- Sectional Occupational Health Survey” [ International Sta-tistical Review (1977) 45: 51– 62]. The data were taken from a large survey of workers in the cotton industry. The researchers wanted to study the factors that may be associated with brown lung disease resulting from inhaling particles of cotton, flax, hemp, or jute. The variables are as follows: number of workers suffering from disease (yes); number of workers not suffering from disease (no); dustiness of workplace (1— high; 2— medium; 3— low); race (1— white; 2— other); sex (1— male; 2— female); smoking history (1— smoker; 2— nonsmoker); length of employment in cotton industry (1— less than 10 years; 2— between 10 and 20 years; 3— more than 20 years).
a. List the five covariates from most likely to least likely to be associated with the probability that a cotton worker has brown lung disease.
b. Do there appear to be any interactions between the covariates?
c. Use a statistical software package to obtain a prediction model using all five covariates.

  • CreatedNovember 21, 2015
  • Files Included
Post your question