New Semester
Started
Get
50% OFF
Study Help!
--h --m --s
Claim Now
Question Answers
Textbooks
Find textbooks, questions and answers
Oops, something went wrong!
Change your search query and then try again
S
Books
FREE
Study Help
Expert Questions
Accounting
General Management
Mathematics
Finance
Organizational Behaviour
Law
Physics
Operating System
Management Leadership
Sociology
Programming
Marketing
Database
Computer Network
Economics
Textbooks Solutions
Accounting
Managerial Accounting
Management Leadership
Cost Accounting
Statistics
Business Law
Corporate Finance
Finance
Economics
Auditing
Tutors
Online Tutors
Find a Tutor
Hire a Tutor
Become a Tutor
AI Tutor
AI Study Planner
NEW
Sell Books
Search
Search
Sign In
Register
study help
business
applied statistics and multivariate
Practical Multivariate Analysis 5th Edition Abdelmonem Afifi, Susanne May, Virginia A. Clark - Solutions
If the probability of an individual getting a hit in baseball is 0.20, then the odds of getting a hit are 0.25. Check to determine that the previous statement is true. Would you prefer to be told that your chances are one in five of a hit or that for every four hitless times at bat you can expect
Calculate the Parental Bonding Overprotection and Parental Bonding Care score for the Parental HIV data (see Appendix A and the codebook). Perform a discriminant function analysis to classify adolescents into a group who has been absent from school without a reason(HOOKEY) and a group who has not
Refer to the table of ideal weights given in Problem 10.8 and calculate the midpoint of each weight range for men and women. Pretending these represent a real sample, perform a discriminant function analysis to classify observations as male or female on the basis of height and weight. How could you
Is it possible to distinguish between men and women in the depression data set on the basis of income and level of depression? What is the classification function? What are your prior probabilities? Test whether the following variables help discriminate: EDUCAT, EMPLOY, HEALTH.
Divide the oldest children in the family lung function data set into two groups based on weight:less than or equal to 101 versus greater than 101. Perform a stepwise discriminant function analysis using the variables OCHEIGHT, OCAGE, MHEIGHT, MWEIGHT, FHEIGHT, and FWEIGHT. Now temporarily remove
(a) In the family lung function data in Appendix A divide the fathers into two groups: group I with FEV1 less than or equal to 4.09, and group II with FEV1 greater than 4.09. Assuming equal prior probabilities and costs, perform a stepwise discriminant function analysis using the variables height,
At the time the study was conducted, the population of Lancaster was 48,027 while Glendora had 38,654 residents. Using prior probabilities based on these population figures and those given in Problem 11.11, and the entire lung function data set (with four AREA-defined groups), perform a
From the family lung function data in Appendix A create a data set containing only those families from Burbank and Long Beach (AREA = 1 or 3). The observations now belong to one of two AREA-defined groups.(a) Assuming equal prior probabilities and costs, perform a discriminant function analysis for
(Continuation of Problem 11.7.) Do a variable selection analysis, using variables X4 to X9 only. Comment.
(Continuation of Problem 11.7.) Do a variable selection analysis for all nine variables. Comment.
(Continuation of Problem 11.7.) Perform a similar analysis, using only X1, X2, and X3. Test the hypothesis that these three variables do as well as all nine classifying the observations.Comment.
In this problem you will modify the data set created in Problem 8.7 to make it suitable for the theoretical exercises in discriminant analysis. Generate the sample data for X1, X2,. . . ,X9 as in Problem 8.7 (Y is not used here). Then for the first 50 cases, add 6 to X1, add 3 to X2, add 5 to X3,
(Continuation of Problem 11.2.) Now divide the companies into three groups: group I consists of those companies with a P/E of 7 or less, group II consists of those companies with a P/E of 8 to 10, and group III consists of those companies with a P/E greater than or equal to 11. Perform a stepwise
(Continuation of Problem 11.2.) Perform a variable selection analysis, using stepwise and best-subset programs. Compare the results with those of the variable selection analysis given in Chapter 9.
(Continuation of Problem 11.2.) Choose a different set of prior probabilities and costs of misclassification that seems reasonable and repeat the analysis.
(Continuation of Problem 11.2.) Test whether D/E alone does as good a classification job as all six variables.
For the data shown in Table 9.1, divide the chemical companies into two groups: group I consists of those companies with a P/E less than 9, and group II consists of those companies with a P/E greater than or equal to 9. Group I should be considered mature or troubled firms, and group II should be
Using the depression data set, perform a stepwise discriminant function analysis with age, sex, log(income), bed days, and health as possible variables. Compare the results with those given in Section 11.13.
For the variables describing the average number of cigarettes smoked during the past 3 months(SMOKEP3M) and the variable describing the mother’s education (EDUMO) in the Parental HIV data determine the percent with missing values. For each of the variables describe hypothetical scenarios which
Using the data from the Parents HIV/AIDS study, for those adolescents who have started to use alcohol, predict the age when they first start their use (AGEALC). Predictive variables should include NGHB11 (drinking in the neighborhood) GENDER, HOWREL (how religious). Choose suitable referent groups
Using dummy variables, run a regression analysis that relates CESD as the dependent variable to marital status in the depression data set given in Chapter 3. Do it separately for males and females. Repeat using the combined group, but including a dummy variable for sex and any necessary interaction
Using the family lung function data, find the regression of height for the oldest child on mother’s and father’s height. Include a dummy variable for the sex of the child and any necessary interaction terms.
Perform a ridge regression analysis of the family lung function data using FEV1 of the oldest child as the dependent variable and height, weight and age of the oldest child as the independent variables.
Using the family lung function data, relate FEV1 to height for the oldest child in three ways:simple linear regression (Problem 7.9), regression of FEV1 on height squared, and spline regression (split at HEI = 64). Which method is preferable?
In the depression data set, define Y = the square root of total depression score (CESD), X1 =log(income), X2 = Age, X3 = Health and X4 = Bed days. Set X1 = missing whenever X3 = 4(poor health). Also set X2 = missing whenever X2 is between 50 and 59 (inclusive). Are these data missing at random? Try
Take the family lung function data described in Appendix A and delete (label as missing) the height of the middle child for every family with ID divisible by 6, that is, families 6, 12, 18 etc.(To find these, look for those IDs with ID/6=integer part of (ID/6).) Delete the FEV1 of the middle child
(Continuation of Problem 10.8.) Using the data in the table given in Problem 10.8, compute the midpoints of weight range for all frame sizes for men and women separately. Pretending that the results represent a real sample, so that each height has three Y values associated with it instead of one,
Use the data described in Problem 8.7. Since some of the X variables are intercorrelated, it may be useful to do a ridge regression analysis of Y on X1 to X9. Perform such an analysis, and compare the results to those of Problems 8.10 and 9.7.
Unlike the real data used in Problem 10.5, the accompanying data are “ideal” weights published by the Metropolitan Life Insurance Company for American men and women. Compute Y = midpoint of weight range for medium-framed men and women for the various heights shown in the table. Pretending that
(Continuation of Problem 10.5.) Do a similar analysis for the first boy and girl. Include age and age squared in the regression equation.
Another way to answer the question of interaction between the independent variables in Problem 8.13 is to define a dummy variable that indicates whether an observation is above the median weight, and an equivalent variable for height. Relate FEV1 for the fathers to these dummy variables, including
Use the lung function data described in Appendix A. For the parents we wish to relate Y =weight to X = height for both men and women in a single equation. Using dummy variables, write an equation for this purpose, including an interaction term. Interpret the parameters. Run a regression analysis,
Draw a ridge trace for the accompanying data. Variable Case X1 X2 X3 Y 1234 0.46 0.96 6.42 3.46 0.06 0.53 5.53 2.25 3 1.49 1.87 8.37 5.69 1.02 0.27 5.37 2.36 5 1.39 0.04 5.44 2.65 6 0.91 0.37 6.28 3.31 7 1.18 0.70 6.88 3.89 8 1.00 0.43 6.43 3.27 Mean 0.939 0.646 6.340 3.360 Standard deviation 0.475
In the depression data set, determine whether religion has an effect on income when used as an independent variable along with age, sex, and educational level.
Repeat Problem 10.1, but now use a dummy variable for education. Divide the education level into three categories: did not complete high school, completed at least high school, and completed at least a bachelor’s degree. Compare the interpretation you would make of the effects of education on
In the depression data set described in Chapter 3, data on educational level, age, sex, and income are presented for a sample of adults from Los Angeles County. Fit a regression plane with income as the dependent variable and the other variables as independent variables. Use a dummy variable for
Using the Parental HIV data find the best model that predicts the age at which adolescents started drinking alcohol among those who have started drinking alcohol. Since the data were collected retrospectively, only consider variables which might be considered representative of the time before the
Using the Parental HIV data consider performing a confirmatory data analysis investigating the relationship between the age at which children started drinking alcohol (if they have already started) and gender without running any analysis first. Consider what variables might (a priori)be a potential
From among the candidate variables given in Problem 9.11, find the subset of three variables that best predicts height in the oldest child, separately for boys and girls. Are the two sets the same? Find the best subset of three variables for the group as a whole. Does adding OCSEX into the
Using the methods described in this chapter and the family lung function data described in Appendix A, and choosing from among the variables OCAGE, OCWEIGHT, MHEIGHT, MWEIGHT, FHEIGHT, and FWEIGHT, select the variables that best predict height in the oldest child. Show your analysis.
Force the variables you selected in Problem 9.9(a) into the regression equation with OCFEV1 as the dependent variable, and test whether including the FEV1 of the parents (i.e., the variables MFEV1 and FFEV1 taken as a pair) in the equation significantly improves the regression.
(a) For the lung function data set described in Appendix A with age, height, weight, and FVC as the candidate independent variables, use subset regression to find which variables best predict FEV1 in the oldest child. State the criteria you use to decide. (b) Repeat, using forward selection and
In Problem 8.7 the population multiple R2 of Y on X4, X5,. . . , X9 is zero. However, from the sample alone we don’t know this result. Perform a variable selection analysis on X4 to X9, using your sample, and comment on the results.
For the data from Problem 8.7, perform a variable selection analysis, using the methods described in this chapter. Comment on the results in view of the population parameters.
Use the data you generated from Problem 8.7, where X1, X2,. . . ,X9 are the independent variables and Y is the dependent variable. Use the generalized linear hypothesis test to test the hypothesis that b4 = b5 = = b9 = 0: Comment in light of what you know about the population parameters.
Using the data given in Table 9.1, repeat the analyses described in this chapter with (P/E)1=2 as the dependent variable instead of P/E. Do the results change much? Does it make sense to use the square root transformation?
For adult males it has been demonstrated that age and height are useful in predicting FEV1.Using the data described in Appendix A, determine whether the regression plane can be improved by also including weight.
Forbes gives, each year, the same variables listed in Table 9.1 for the chemical industry. The changes in lines of business and company mergers resulted in a somewhat different list of chemical companies in 1982. We have selected a subset of 13 companies that are listed in both years and whose main
Repeat Problem 9.1 using subset regression, and compare the results.
Use the depression data set described in Table 3.4. Using CESD as the dependent variable, and age, income, and level of education as the independent variables, run a forward stepwise regression program to determine which of the independent variables predict level of depression for women.
For the Parental HIV data generate a variable that represents the sum of the variables describing the neighborhood where the adolescent lives (NGHB1–NGHB11). Is the age at which adolescents start smoking different for girls compared to boys, after adjusting for the score describing the
Repeat Problem 8.15(a) for fathers’ measurements instead of those of the oldest children. Are the regression coefficients more stable? Why?
(Continuation of Problem 8.13.)a) For the oldest child, find the regression of FEV1 on (i) weight and age; (ii) height and age; (iii)height, weight, and age. Compare the three regression equations. In each regression, which coefficients are significantly different from zero?b) Find the correlation
(Continuation of Problem 8.13.) Find the partial correlation of FEV1 and age given height for the oldest child, and compare it to the simple correlation between FEV1 and age of the oldest child. Is either one significantly different from zero? Based on these results and without doing any further
For the lung function data described in Appendix A, find the regression of FEV1 on weight and height for the fathers. Divide each of the two explanatory variables into two intervals:greater than, and less than or equal to the respective median. Is there an interaction between the two explanatory
(Continuation of Problem 8.11.) For the regression of CESD on INCOME and AGE, choose 15 observations that appear to be influential or outlying. State your criteria, delete these points, and repeat the regression. Summarize the differences in the results of the regression analyses.
(Continuation of Problem 8.5.) Fit a regression plane for CESD on INCOME and AGE for males and females combined. Test whether the regression plane is helpful in predicting the values of CESD. Find a 95% prediction interval for a female with INCOME = 17 and AGE= 29 using this regression. Do the same
(Continuation of Problem 8.7.) Perform a multiple regression analysis, with the dependent variable = Y and the independent variables = X1 to X9, on the 100 generated cases. Summarize the results and state whether they came out the way you expected them to, considering how the data were generated.
(Continuation of Problem 8.7.) Calculate the population partial correlation coefficient between X2 and X3 after removing the linear effect of X1. Is it larger or smaller than r23? Explain. Also, obtain the corresponding sample partial correlation. Test whether it is equal to zero.
Repeat Problem 8.7 using another statistical package and see if you get the same sample.
Using a statistical package of your choice, create a hypothetical data set which you will use for exercises in this chapter and some of the following chapters. Begin by generating 100 independent cases for each of ten variables using the standard normal distribution (means =0 and variances = 1).
Search for a suitable transformation for CESD if the normality assumption in Problem 8.5 cannot be made. State why you are not able to find an ideal transformation if that is the case.
From the depression data set described in Table 3.4, predict the reported level of depression as given by CESD, using INCOME, SEX, and AGE as independent variables. Analyze the residuals and decide whether or not it is reasonable to assume that they follow a normal distribution.
Fit the regression plane for mothers with MFVC as the dependent variable and age and height as the independent variables. Summarize the results in a tabular form. Test whether the regression results for mothers and fathers are significantly different.
Write the results for Problem 8.2 so they would be suitable for inclusion in a report. Include table(s) that present the results the reader should see.
Fit the regression plane for the fathers using FFVC as the dependent variable and age and height as the independent variables.
Using the chemical companies’ data in Table 9.1, predict the price earnings (P/E) ratio from the debt to equity (D/E) ratio, the annual dividends divided by the 12-months’ earnings per share (PAYOUTR1), and the percentage net profit margin (NPM1).Obtain the correlation matrix, and check the
Using the summary variable describing the neighborhood in Problem 7.15, generate a loess graph to examine the relationship between this variable and the age at which adolescents started using marijuana. Interpret the graph.
For the Parental HIV data generate a variable that represents the sum of the variables describing the neighborhood where the adolescent lives (NGHB1–NGHB11). Does the age at which adolescents start smoking depend on the score describing the neighborhood?
For the Parental HIV data produce a scatterplot of the age at which adolescents first started smoking versus the age at which they first started drinking alcohol. Based on the graph, do adolescents tend to start smoking before drinking alcohol, or vice versa? Calculate the correlation coefficient.
For the mother, perform a regression of FEV1 on weight. Test whether the coefficients are zero. Plot the regression line on a scatter diagram of MFEV1 versus MWE1. On this plot, identify the following groups of points:group 1: ID = 12, 33, 45, 42, 94, 144;group 2: ID = 7, 94, 105, 107, 115, 141,
Examine the residual plot from the regression of FEV1 on height for the oldest child. Choose an appropriate transformation, perform the regression with the transformed variable, and compare the results (statistics, plots) with the original regression analysis.
What is the correlation between height and weight in the oldest child? How would your answer to the last part of Problem 7.10 change if r = 1? r = ????1? r = 0?
For the oldest child, perform the following regression analyses: FEV1 on weight, FEV1 on height, FVC on weight, and FVC on height. Note the values of the slope and correlation coefficient for each regression and test whether they are equal to zero. Discuss whether height or weight is more strongly
From the depression data set described in Table 3.4 create a data set containing only the variables AGE and INCOME.a) Find the regression of income on age.b) Successively add and then delete each of the following points:AGE INCOME 42 120 80 150 180 15 and repeat the regression each time with the
(Continuation of Problem 7.7.) Calculate the variance of CESD for observations in each of the groups defined by income as follows: INCOME 59. For each observation, define a variable WEIGHT equal to 1 divided by the variance of a CESD within the income group to which it belongs. Obtain a weighted
Using the depression data set (see Table 3.4), perform a regression analysis of depression, as measured by CESD, on income. Plot the residuals. Does the normality assumption appear to be met? Repeat using the logarithm of CESD instead of CESD. Is the fit improved?
Examine the plot you produced in Problem 7.1 and choose some transformation for X and/or Y and repeat the analysis described there. Compare the correlation coefficients for the original and transformed variables, and decide whether the transformation helped. If so, which transformation was helpful?
Repeat Problem 7.2 using log(weight) and log(height) in place of the original variables. Using graphical and numerical devices, decide if the transformations help.
For the data in Problem 7.3, pretend that the index increases linearly in time and use linear regression to obtain an equation to forecast the index value as a function of time. Using “volume”as a weight variable, obtain a weighted least squares forecasting equation. Does weighted least squares
In Problem 5.8, the New York Stock Exchange Composite Index and daily volume for August 9 through September 17, 1982, were presented. Describe how volume appears to be affected by the price index, using regression analysis. Describe whether or not the residuals from your regression analysis are
From the family lung function data set in Appendix A, perform a regression analysis of weight on height for fathers. Repeat for mothers. Determine the correlation coefficient and the regression equation for fathers and mothers. Test that the coefficients are significantly different from zero for
In Table 9.1, financial performance data of 30 chemical companies are presented. Use growth in earnings per share, labelled EPS5, as the dependent variable and growth in sales, labelled SALESGR5, as the independent variable. (A description of these variables is given in Section 9.3.) Plot the data,
The Parental HIV data include information on the age at which adolescents started smoking.Where does this variable fit into Stevens’s classification scheme? Particularly comment on the issue relating to adolescents who had not starting smoking by the time they were interviewed.
Suppose you would like to analyze the relationship between the number of times an adolescent has been absent from school without a reason and how much the adolescent likes/liked going to school for the Parental HIV data (Appendix A). Suggest ways to analyze this relationship.Suggest other variables
In the depression study, information was obtained on the respondent’s religion (Chapter 3).Describe why you think it is incorrect to obtain an average score for religion across the 294 respondents.
Using the lung function data described in the Appendix, an investigator would like to predict a child’s lung function based on that of the parents and the area they live in. What analyses would be appropriate to use?
A psychologist would like to predict whether or not a respondent in the depression study described in Chapter 3 is depressed. To do this, she would like to use the information contained in the following variables: MARITAL, INCOME, and AGE. Suggest analyses.
Two methods are currently used to treat a particular type of cancer. It is suspected that one of the treatments is twice as effective as the other in prolonging survival regardless of the severity of the disease at diagnosis. A study is carried out. After the data are collected, what analysis
A member of the admissions committee notices that there are several women with high grade point averages but low SAT scores. He wonders if this pattern holds for both men and women in general, only for women in general, or only in a few cases. Suggest ways to analyze this problem.
For the data described in Problem 6.6 we wish to relate health data such as infant mortality(the proportion of children dying before the age of one year) and life expectancy (the expected age at death of a person born today if the death rates remain unchanged) to other data such as gross national
For the data described in the prior problem we wish to put together similar countries into groups. Suggest possible analyses.
Large amounts of data are available from the United Nations and other international organizations such as the World Bank for each country and sovereign state of the world, including health, education, and commercial data. An economist would like to invent a descriptive system for the degree of
Data on men and women who have died have been obtained from health maintenance organization records. These data include age at death, height and weight, and several physiological and lifestyle measurements such as blood pressure, smoking status, dietary intake, and usual amount of exercise. The
A college admissions committee wishes to predict which prospective students will successfully graduate. To do so, the committee intends to obtain the college grade point averages for a sample of college seniors and compare these with their high school grade point averages and Scholastic Aptitude
A coach has made numerous measurements on successful basketball players, such as height, weight, and strength. He also knows which position each player is successful at. He would like to obtain a function from these data that would predict which position a new player would be best at. Suggest an
An investigator is attempting to determine the health effects on families of living in crowded urban apartments. Several characteristics of the apartment have been measured, including square feet of living area per person, cleanliness, and age of the apartment. Several illness characteristics for
Compute an appropriate measure of the center of the distribution for the following variables from the depression data set: MARITAL, INCOME, AGE, and HEALTH.
Using the lung cancer data described in Appendix A, examine the distribution of the variable days separately for those who died (death=1) and for those who did not (death=0). Plot a normal probability plot, a histogram, and a boxplot for each. Use the methods described in this chapter to choose
Using the Parental HIV data calculate an overall Brief Symptom Inventory (BSI) score for each adolescent (see the codebook for details). Log-transform the BSI score. Obtain a normal probability plot for the log-transformed variables. Does the log-transformed variable seem to be normally
Showing 300 - 400
of 2391
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Last
Step by Step Answers