New Semester
Started
Get
50% OFF
Study Help!
--h --m --s
Claim Now
Question Answers
Textbooks
Find textbooks, questions and answers
Oops, something went wrong!
Change your search query and then try again
S
Books
FREE
Study Help
Expert Questions
Accounting
General Management
Mathematics
Finance
Organizational Behaviour
Law
Physics
Operating System
Management Leadership
Sociology
Programming
Marketing
Database
Computer Network
Economics
Textbooks Solutions
Accounting
Managerial Accounting
Management Leadership
Cost Accounting
Statistics
Business Law
Corporate Finance
Finance
Economics
Auditing
Tutors
Online Tutors
Find a Tutor
Hire a Tutor
Become a Tutor
AI Tutor
AI Study Planner
NEW
Sell Books
Search
Search
Sign In
Register
study help
business
applied statistics and multivariate
Practical Multivariate Analysis 6th Edition Abdelmonem Afifi, Susanne May, Virginia A. Clark, Robin Donatello - Solutions
Assuming a log-linear model for survival, does smoking status (i.e., the variables Smokbl and Smokfu) significantly affect survival?
Do the patterns of censoring appear to be the same for smokers at baseline, ex-smokers at baseline, and nonsmokers at baseline? What about for those who are smokers, ex-smokers, and nonsmokers at follow-up?
Repeat Problem 13.1, using a Cox proportional hazards model instead of a log-linear. Compare the results.
(a) Find the effect of Stagen and Hist upon survival by fitting a log-linear model. Check any assumptions and evaluate the fit using the graphical methods described in this chapter.(b) What happens in part (a) if you include Staget in your model along with Stagen and Hist as predictors of survival?
(Problem 12.30continued) Using an appropriate method in your software package, obtain confidence intervals for the odds ratios you computed in partsa, b and c of Problem 12.30.
(Problem 12.29continued) Fit a logistic regression model (again using as the outcome “evacuate home” (V173)) which includes as the only covariates home owner status (rent/own, V449)and status of home damage (V127) and an appropriate interaction term. Is there a statistically significant
This problem and the following ones also use the Northridge earthquake data set. Perform an appropriate regression analysis using variable selection techniques for the following outcome:evacuate home (use V173 of the questionnaire). Choose from the following predictor variables:MMI (which is a
(Problem 12.22continued) Perform diagnostic procedures to identify influential observations.Remove the four (4) most influential observations using the delta chi-square method. Rerun the analysis and compare the results. What is your conclusion?
(Problem 12.22continued) Is there an interaction effect between age and home ownership, controlling for gender and ethnicity?
(Problem 12.22continued) Is there an interaction effect between gender and home ownership?That is, are the estimated effects of home ownership upon reporting emotional injuries different for men and women, controlling for age and ethnicity?
(Problem 12.22continued) Are the effects of ethnicity upon reporting emotional injuries statistically significant, controlling for home ownership status, age, and gender? Use a likelihood ratio test to answer this question.
(Problem 12.22continued) Based on your results, what is the estimated probability of reporting emotional injuries for a 30-year-old white female renter? For a 50-year-old Latino home owner?
(Problem 12.22 continued) Fit a logistic regression model using emotional injury (yes/no, W238) as an outcome and using home ownership status (rent/own, V449), age (RAGE), gender(RSEX), and ethnicity (NEWETHN) as independent variables. Give parameter estimates and their standard errors. What are
This problem and the following ones use the Northridge earthquake data set. We wish to answer the questions: Were homeowners more likely than renters to report emotional injuries as a result of the Northridge earthquake, controlling for age (RAGE), gender (RSEX), and ethnicity (NEWETHN)? Use V449,
For the model in 12.20 use the referent time as given by the variable HMONTH to define the offset. Run the same model as in 12.20. Present rate ratios, including confidence intervals, and interpret the results. Describe any discrepancies compared to the results obtained in 12.20.
For the Parental HIV data perform a Poisson regression on the number of days the adolescents were absent from school without a reason. For this analysis assume that the referent time period was one month for all adolescents. As independent variables consider gender, age, and to what degree the
For the model in 12.18 find an appropriate cutoff point to discriminate between adolescents who were absent without a reason and those who were not. Assess how well the model predicts the outcome using sensitivity, specificity, and the ROC curve.
Perform a binary logistic regression analysis using the Parental HIV data to model the probability of having been absent from school without a reason (variable HOOKEY). Find the variables that best predict whether an adolescent had been absent without a reason or not.Assess goodness-of-fit for the
Perform a nominal and ordinal logistic regression analysis using the health scale as the outcome variable and age and income as independent variables. Present and interpret the results for an increase in age of 10 years and an increase in income of $5,000. How do the results from the ordinal
For the family lung function data perform a nominal logistic regression where the outcome variable is place of residence and the predictors are those defined in Problem 12.11. Test the hypothesis that each of the variables can be dropped, while keeping all the others in the model.Compare your
For the depression data perform an ordinal logistic regression analysis using the same outcome categories as in Section 12.9 which reverses the order and hence compares more severe depression to less severe depression. Use age, income, and sex as the independent variables.Compare your results to
Generate a graph for income similar to Figure 12.2 to assess whether modeling income linearly seems appropriate.
Perform a logistic regression analysis for the depression data which includes income and sex and models age as ana) quadratic orb) cubic function. Use likelihood ratio test statistics to determine whether these models are significantly better than a model which includes income and sex and models
Assume a logistic regression model includes a continuous variable like age and a categorical variable like gender and an interaction term for these. Is the P value for any of the main effects helpful in determining whethera) the interaction is significant,b) whether the main effect should be
For the family lung function data set, define a new variable VALLEY (residence in San Gabriel or San Fernando Valley) to be one if the family lives in Burbank or Glendora and zero otherwise.Using variable selection techniques, perform a logistic regression of VALLEY on mother’s age and FEV1,
Using the definition of low FEV1 given in Problem 12.9, perform a logistic regression of low FEV1 on area for the fathers. Include all four areas and use a dummy variable for each area.What is the intercept term? Is it what you expected (or could have expected)?
Define low FEV1 to be an FEV1 measurement below the median FEV1 of the fathers in the family lung function data set given in Appendix A. What are the odds that a father in this data set has low FEV1? What are the odds that a father from Glendora has low FEV1? What are the odds that a father from
Repeat Problem 12.7, but for chronic rather than acute illness.
Using the depression data set, appropriate variable selection techniques, and logistic regression, describe the probability of an acute illness as a function of age, education, income, depression, and regular drinking.
(a) Using the depression data set, fill in the following table:What are the odds that a woman is a regular drinker? That a man is a regular drinker? What is the odds ratio?(b) Repeat the tabulation and calculations for part (a) separately for people who are depressed and those who are not. Compare
Perform a logistic regression analysis on the data described in Problem 11.2.
Perform a logistic regression analysis with the same variables and data used in the example in Problem 11.13.
The accompanying table presents the number of individuals by smoking and disease status.What are the odds that a smoker will get disease A? That a nonsmoker will get disease A?What is the odds ratio? Disease A Smoking Yes No Total Yes No 80 80 120 200 20 280 300 Total 100 400 500
Using the formula odds = P=(1????P), fill in the accompanying table. Odds P 0.25 0.20 0.5 1.0 0.5 1.5 2.0 2.5 3.0 0.75 5.0
If the probability of an individual getting a hit in baseball is 0.20, then the odds of getting a hit are 0.25. Check to determine that the previous statement is true. Would you prefer to be told that your chances are one in five of a hit or that for every four hitless times at bat you can expect
Calculate the Parental Bonding Overprotection and Parental Bonding Care score for the Parental HIV data (see Appendix A and the codebook). Perform a discriminant function analysis to classify adolescents into a group who has been absent from school without a reason(HOOKEY) and a group who has not
Refer to the table of ideal weights given in Problem 10.8 and calculate the midpoint of each weight range for men and women. Pretending these represent a real sample, perform a discriminant function analysis to classify observations as male or female on the basis of height and weight. How could you
Is it possible to distinguish between men and women in the depression data set on the basis of income and level of depression? What is the classification function? What are your prior probabilities? Test whether the following variables help discriminate: EDUCAT, EMPLOY, HEALTH.
Divide the oldest children in the family lung function data set into two groups based on weight:less than or equal to 101 versus greater than 101. Perform a stepwise discriminant function analysis using the variables OCHEIGHT, OCAGE, MHEIGHT, MWEIGHT, FHEIGHT, and FWEIGHT. Now temporarily remove
(a) In the family lung function data in Appendix A divide the fathers into two groups: group I with FEV1 less than or equal to 4.09, and group II with FEV1 greater than 4.09. Assuming equal prior probabilities and costs, perform a stepwise discriminant function analysis using the variables height,
At the time the study was conducted, the population of Lancaster was 48,027 while Glendora had 38,654 residents. Using prior probabilities based on these population figures and those given in Problem 11.11, and the entire lung function data set (with four AREA-defined groups), perform a
From the family lung function data in Appendix A create a data set containing only those families from Burbank and Long Beach (AREA = 1 or 3). The observations now belong to one of two AREA-defined groups.(a) Assuming equal prior probabilities and costs, perform a discriminant function analysis for
(Continuation of Problem 11.7.) Do a variable selection analysis, using variables X4 to X9 only. Comment.
(Continuation of Problem 11.7.) Do a variable selection analysis for all nine variables. Comment.
(Continuation of Problem 11.7.) Perform a similar analysis, using only X1, X2, and X3. Test the hypothesis that these three variables do as well as all nine classifying the observations.Comment.
(Y is not used here). Then for the first 50 cases, add 6 to X1, add 3 to X2, add 5 to X3, and leave the values for X4 to X9 as they are. For the last 50 cases, leave all the data as they are. Thus the first 50 cases represent a random sample from a multivariate normal population called population I
In this problem you will modify the data set created in Problem 8.7 to make it suitable for the theoretical exercises in discriminant analysis. Generate the sample data for X1, X2,. . . ,X9 as in Problem
(Continuation of Problem 11.2.) Now divide the companies into three groups: group I consists of those companies with a P/E of 7 or less, group II consists of those companies with a P/E of 8 to 10, and group III consists of those companies with a P/E greater than or equal to 11. Perform a stepwise
(Continuation of Problem 11.2.) Perform a variable selection analysis, using stepwise and best-subset programs. Compare the results with those of the variable selection analysis given in Chapter 9.
(Continuation of Problem 11.2.) Choose a different set of prior probabilities and costs of misclassification that seems reasonable and repeat the analysis.
(Continuation of Problem 11.2.) Test whether D/E alone does as good a classification job as all six variables.
For the data shown in Table 9.1, divide the chemical companies into two groups: group I consists of those companies with a P/E less than 9, and group II consists of those companies with a P/E greater than or equal to 9. Group I should be considered mature or troubled firms, and group II should be
Using the depression data set, perform a stepwise discriminant function analysis with age, sex, log(income), bed days, and health as possible variables. Compare the results with those given in Section 11.13.
For the variables describing the average number of cigarettes smoked during the past 3 months(SMOKEP3M) and the variable describing the mother’s education (EDUMO) in the Parental HIV data determine the percent with missing values. For each of the variables describe hypothetical scenarios which
Using the data from the Parents HIV/AIDS study, for those adolescents who have started to use alcohol, predict the age when they first start their use (AGEALC). Predictive variables should include NGHB11 (drinking in the neighborhood) GENDER, HOWREL (how religious). Choose suitable referent groups
Using dummy variables, run a regression analysis that relates CESD as the dependent variable to marital status in the depression data set given in Chapter 3. Do it separately for males and females. Repeat using the combined group, but including a dummy variable for sex and any necessary interaction
Using the family lung function data, find the regression of height for the oldest child on mother’s and father’s height. Include a dummy variable for the sex of the child and any necessary interaction terms.
Perform a ridge regression analysis of the family lung function data using FEV1 of the oldest child as the dependent variable and height, weight and age of the oldest child as the independent variables.
Using the family lung function data, relate FEV1 to height for the oldest child in three ways:simple linear regression (Problem 7.9), regression of FEV1 on height squared, and spline regression (split at HEI = 64). Which method is preferable?
In the depression data set, define Y = the square root of total depression score (CESD), X1 =log(income), X2 = Age, X3 = Health and X4 = Bed days. Set X1 = missing whenever X3 = 4(poor health). Also set X2 = missing whenever X2 is between 50 and 59 (inclusive). Are these data missing at random? Try
Take the family lung function data described in Appendix A and delete (label as missing) the height of the middle child for every family with ID divisible by 6, that is, families 6, 12, 18 etc.(To find these, look for those IDs with ID/6=integer part of (ID/6).) Delete the FEV1 of the middle child
(Continuation of Problem 10.8.) Using the data in the table given in Problem 10.8, compute the midpoints of weight range for all frame sizes for men and women separately. Pretending that the results represent a real sample, so that each height has three Y values associated with it instead of one,
Use the data described in Problem 8.7. Since some of the X variables are intercorrelated, it may be useful to do a ridge regression analysis of Y on X1 to X9. Perform such an analysis, and compare the results to those of Problems 8.10 and 9.7.
Unlike the real data used in Problem 10.5, the accompanying data are “ideal” weights published by the Metropolitan Life Insurance Company for American men and women. Compute Y = midpoint of weight range for medium-framed men and women for the various heights shown in the table. Pretending that
(Continuation of Problem 10.5.) Do a similar analysis for the first boy and girl. Include age and age squared in the regression equation.
Another way to answer the question of interaction between the independent variables in Problem 8.13 is to define a dummy variable that indicates whether an observation is above the median weight, and an equivalent variable for height. Relate FEV1 for the fathers to these dummy variables, including
Use the lung function data described in Appendix A. For the parents we wish to relate Y =weight to X = height for both men and women in a single equation. Using dummy variables, write an equation for this purpose, including an interaction term. Interpret the parameters. Run a regression analysis,
Draw a ridge trace for the accompanying data. Variable Case X1 X2 X3 Y 12345678 0.46 0.96 6.42 3.46 0.06 0.53 5.53 2.25 1.49 1.87 8.37 5.69 1.02 0.27 5.37 2.36 1.39 0.04 5.44 2.65 0.91 0.37 6.28 3.31 1.18 0.70 6.88 3.89 8 Mean Standard deviation 0.475 0.566 1.00 0.43 6.43 3.27 0.939 0.646 6.340
In the depression data set, determine whether religion has an effect on income when used as an independent variable along with age, sex, and educational level.
Repeat Problem 10.1, but now use a dummy variable for education. Divide the education level into three categories: did not complete high school, completed at least high school, and completed at least a bachelor’s degree. Compare the interpretation you would make of the effects of education on
In the depression data set described in Chapter 3, data on educational level, age, sex, and income are presented for a sample of adults from Los Angeles County. Fit a regression plane with income as the dependent variable and the other variables as independent variables. Use a dummy variable for
Using the Parental HIV data find the best model that predicts the age at which adolescents started drinking alcohol among those who have started drinking alcohol. Since the data were collected retrospectively, only consider variables which might be considered representative of the time before the
Using the Parental HIV data consider performing a confirmatory data analysis investigating the relationship between the age at which children started drinking alcohol (if they have already started) and gender without running any analysis first. Consider what variables might (a priori)be a potential
From among the candidate variables given in Problem 9.11, find the subset of three variables that best predicts height in the oldest child, separately for boys and girls. Are the two sets the same? Find the best subset of three variables for the group as a whole. Does adding OCSEX into the
Using the methods described in this chapter and the family lung function data described in Appendix A, and choosing from among the variables OCAGE, OCWEIGHT, MHEIGHT, MWEIGHT, FHEIGHT, and FWEIGHT, select the variables that best predict height in the oldest child. Show your analysis.
Force the variables you selected in Problem 9.9(a) into the regression equation with OCFEV1 as the dependent variable, and test whether including the FEV1 of the parents (i.e., the variables MFEV1 and FFEV1 taken as a pair) in the equation significantly improves the regression.
(a) For the lung function data set described in Appendix A with age, height, weight, and FVC as the candidate independent variables, use subset regression to find which variables best predict FEV1 in the oldest child. State the criteria you use to decide. (b) Repeat, using forward selection and
In Problem 8.7 the population multiple R2 of Y on X4, X5,. . . , X9 is zero. However, from the sample alone we don’t know this result. Perform a variable selection analysis on X4 to X9, using your sample, and comment on the results.
For the data from Problem 8.7, perform a variable selection analysis, using the methods described in this chapter. Comment on the results in view of the population parameters.
Use the data you generated from Problem 8.7, where X1, X2,. . . ,X9 are the independent variables and Y is the dependent variable. Use the generalized linear hypothesis test to test the hypothesis that b4 = b5 = = b9 = 0: Comment in light of what you know about the population parameters.
Using the data given in Table 9.1, repeat the analyses described in this chapter with (P/E)1=2 as the dependent variable instead of P/E. Do the results change much? Does it make sense to use the square root transformation?
For adult males it has been demonstrated that age and height are useful in predicting FEV1.Using the data described in Appendix A, determine whether the regression plane can be improved by also including weight.
includes data for both years(Forbes, vol. 127, no. 1 (January 5, 1981) and Forbes, vol. 131, no. 1 (January 3, 1983)). Do a forward stepwise regression analysis, using P/E as the dependent variable and ROR5, D/E, SALESGR5, EPS5, NPM1, and PAYOUTR1 as independent variables, on both years’ data and
Forbes gives, each year, the same variables listed in Table 9.1 for the chemical industry. The changes in lines of business and company mergers resulted in a somewhat different list of chemical companies in 1982. We have selected a subset of 13 companies that are listed in both years and whose main
Repeat Problem 9.1 using subset regression, and compare the results.
Use the depression data set described in Table 3.4. Using CESD as the dependent variable, and age, income, and level of education as the independent variables, run a forward stepwise regression program to determine which of the independent variables predict level of depression for women.
For the Parental HIV data generate a variable that represents the sum of the variables describing the neighborhood where the adolescent lives (NGHB1–NGHB11). Is the age at which adolescents start smoking different for girls compared to boys, after adjusting for the score describing the
Repeat Problem 8.15(a) for fathers’ measurements instead of those of the oldest children. Are the regression coefficients more stable? Why?
(Continuation of Problem 8.13.)a) For the oldest child, find the regression of FEV1 on (i) weight and age; (ii) height and age; (iii)height, weight, and age. Compare the three regression equations. In each regression, which coefficients are significantly different from zero?b) Find the correlation
(Continuation of Problem 8.13.) Find the partial correlation of FEV1 and age given height for the oldest child, and compare it to the simple correlation between FEV1 and age of the oldest child. Is either one significantly different from zero? Based on these results and without doing any further
For the lung function data described in Appendix A, find the regression of FEV1 on weight and height for the fathers. Divide each of the two explanatory variables into two intervals:greater than, and less than or equal to the respective median. Is there an interaction between the two explanatory
(Continuation of Problem 8.11.) For the regression of CESD on INCOME and AGE, choose 15 observations that appear to be influential or outlying. State your criteria, delete these points, and repeat the regression. Summarize the differences in the results of the regression analyses.
(Continuation of Problem 8.5.) Fit a regression plane for CESD on INCOME and AGE for males and females combined. Test whether the regression plane is helpful in predicting the values of CESD. Find a 95% prediction interval for a female with INCOME = 17 and AGE= 29 using this regression. Do the same
(Continuation of Problem 8.7.) Perform a multiple regression analysis, with the dependent variable = Y and the independent variables = X1 to X9, on the 100 generated cases. Summarize the results and state whether they came out the way you expected them to, considering how the data were generated.
(Continuation of Problem 8.7.) Calculate the population partial correlation coefficient between X2 and X3 after removing the linear effect of X1. Is it larger or smaller than r23? Explain. Also, obtain the corresponding sample partial correlation. Test whether it is equal to zero.
Repeat Problem 8.7 using another statistical package and see if you get the same sample.
Using a statistical package of your choice, create a hypothetical data set which you will use for exercises in this chapter and some of the following chapters. Begin by generating 100 independent cases for each of ten variables using the standard normal distribution (means =0 and variances = 1).
Search for a suitable transformation for CESD if the normality assumption in Problem 8.5 cannot be made. State why you are not able to find an ideal transformation if that is the case.
From the depression data set described in Table 3.4, predict the reported level of depression as given by CESD, using INCOME, SEX, and AGE as independent variables. Analyze the residuals and decide whether or not it is reasonable to assume that they follow a normal distribution.
Fit the regression plane for mothers with MFVC as the dependent variable and age and height as the independent variables. Summarize the results in a tabular form. Test whether the regression results for mothers and fathers are significantly different.
Showing 500 - 600
of 2391
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Last
Step by Step Answers