Refer to the study on predicting defects in software code written in

Refer to the study on predicting defects in software code written in C language for a NASA spacecraft instrument, Exercise 9.29. The SPSS contingency table for the two categorical variables, actual defective status and predicted defective status using EVG, is reproduced.

a. Show that there are 11 possible contingency tables (including the observed table) with the same marginal totals as the observed table. 

b. Use the hypergeometric formula to find the probability of each of the 11 tables in part a.

c. Use the probabilities, part b, to find the p-value of Fisher’s exact test for independence. Verify your calculations by checking the p-value shown on the SPSS printout.

d. Since the sample size is large, the p-value for the asymptotic chi-square test should be approximately equal to Fisher’s exact test p-value. Is this true?

Data from Exercise 9.29

The PROMISE Software Engineering Repository, hosted by the School of Information Technology and Engineering, University of Ottawa, provides researchers with data sets for building predictive software models. (See Statistics in Action, Chapter 3.) Data on 498 modules of software code written in C language for a NASA spacecraft instrument are saved in the SWDEFECTS file. Recall that each module was analyzed for defects and classified as “true” if it contained defective code and “false” if not. One algorithm for predicting whether or not a module has defects is “essential complexity” (denoted EVG), where a module with at least 15 sub flow graphs with D-structured primes is predicted to have a defect. When the method predicts a defect, the predicted EVG value is “yes”; otherwise, it is “no.”