Question: This problem may be considered a complete SAS project. Import an Excel data set named air pollution.xls using proc import to create a SAS data

This problem may be considered a complete SAS project. Import an Excel data set named air pollution.xls using proc import to create a SAS data set. The data shown in Table B.7 consists of air pollution measured as SO2 content in the air and related other variables for 41 U.S. cities. SO2 is the response variable y, and the explanatory variables are AvTemp (x1), NumFirms (x2), Population (x3), WindSpeed (x4), AvPrecip (x5), and PrecipDays(x6), respectively.

In a second SAS program, access this SAS dataset and perform a variable subset selection procedure using proc reg as described below.

a. Use proc sgscatter to obtain a scatter plot matrix and proc reg to perform a preliminary assessment of the pairwise relationships between y and x1, x2, x3, x4, x5, and x6. On this basis alone, select a few explanatory variables that may good predictors in a multiple regression model. Using the plot and the correlation matrix, find the four variables that are most strongly correlated among the explanatory variables. Based on above analysis alone suggest the explanatory variables that are most strongly involved in multicollinearity when fitting the full model.

b. Use a SAS procedure to fit a first-order multiple regression model to all 41 cities. Discuss the fit of this model using the ANOVA table, R2, and the estimates table. Use other diagnostic tools including output statistics and plots of residuals to examine the adequacy of the model

(use the diagnostic panel of plots). Examining the plots, Obs #31 to be an influential y-outlier. Use the diagnostic statistics to confirm this conjecture.

c. Identify any least squares estimates of the regression coefficients (βˆ’s)

from the fit of the model in part b), that have a sign (positive or negative) that is different from what you would expect for the parameter—

an indication of multicollinearity? Use the standard errors of the parameter estimates to show that these are poorly estimated. Do the variance inflation factors (VIF’s) identify these parameters?

d. Remove the explanatory variables x3 and x6 from the five-variable model and use a multiple regression model to relate y to x1, x2, x4 and x5 only. What can you observe about the multicollinearity in the new model? Is there improvement in the accuracy of estimation of parameters of this model (e.g., decreases standard errors, more t-statistics are significant etc.)? Justify your answers.

e. Remove the case you determined above in part

b) to be a possible outlier from the data and use all 6 variables for the analyses described in the following three parts:

i. Use a SAS procedure to do all possible regressions containing no less than 2 and no more than 4 explanatory variables. Print statistics for only the 4 best models in each case. Construct a plot of the Cp statistic for all models with “reasonable” Cp values. Select a single model, each with 2, 3, and 4 explanatory variables, respectively, for the purpose of predicting annual mean concentration of sulfur dioxide in a city, indicating your reasons for selection of each model. There may be several possible choices, i.e., there may be many “good” models but give arguments for each of your choices. Primarily, use s2, R2, and Cp in your arguments. Select one of these models as your final model and provide arguments supporting your choice.

ii. Use the backward elimination subset selection procedure with significance level of 0.05 for deleting variables to select a possible model. State the model selected and report estimates of parameters and the analysis of variance table for this model.

iii. Use the stepwise subset selection procedure, with significance levels of 0.10 for entry and 0.05 for deletion of variables, respectively, to select a possible model. State the model selected and report estimates of parameters and the analysis of variance table for this model.

A selected subset of the SAS data set baseball available from the SASHELP library containing data on baseball player salaries in 1986/87 is used in the following two problems. There are 21 variables and 71 observations in this data set. To obtain information about the variables, run the SAS code ods select position;proc contents data=baseball varnum; run;. Note that the variable attributes table is named position when the option varnum is used with proc contents. You may print the first 5 observations by running the SAS code proc print data=baseball(obs=5);run; to observe a few values for the above variables. Fit logsalary as a first order multiple regression model of the variables nAtBat, nHits, nHome, nRuns, nRBI, nBB, yrMajor, crAtBat, crHits, crHome, crRuns, crRbi, crBB, nOuts, nAssts and nError. Use the glmselect procedure to perform model selection using two different approaches as described below:

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Categorical Data Analysis Questions!

Ms. Anderson wants to use a SAS program to compute the total score, assign letter grades, and compute summary statistics for her college Stat 101 class. A maximum of 50 points each could be earned...

Scenario Motorhome Spares Wales (MSW) is a company that sells new and second-hand replacement parts for motor homes and camper vans. It does this primarily from its network of 13 shops in and around...

Data Management and Data Analytics . Objective of the project . Base SAS Programming Using SAS Studio on SAS Viya C. SAS Visual Data Mining and machine Learning (In some questions Base SAS...

I need help with this project. I have attached all of the instructions and relevant information. The final draft is due August 20th. Course Project: A Financial Statement Analysis A Comparative...

need help on how to complete this project in a step by step/ click by click way. sani New Perspectives Access 2016 Modules 1-4: SAM Capstone Project la Sandra Coates Skating Club CREATING TABLES,...

Can someone help me with the Accounting Course Project. Instructions are there in the Course Project Instruction and the 10k for Nike and UNder Armour attached. I have also attached the Teamplate....

Assignment 3: Creating variables In this assignment, you will use a sample of the National Trauma Data Bank, the dataset is in the course directory with the name NTDB. Up to this point you have just...

please I need this completed asap. If you need the username and password to log-in let me know \fInstructions City of Bingham Computerized Cumulative Problem For use with McGraw-Hill/Irwin Accounting...

Using SAS link to excel file: https://docs.google.com/spreadsheets/d/1cHOgMt1qiGMzPFLW7EvTPm4OIVQvqT3fHZXi8mcmaf0/edit?usp=sharing 9. Use code to import Excel data. a. Write the SAS code to import...

Tasks The goal of the project is to complete the code for the NgramAnalyser, MarkovModel, ModelMatcher and MatcherController classes, as detailed below, and to add test code to a new JUnit test...

2.1-3 Find the power of a sinusoid C cos (@nt + 0).

Examine Reitmans' balance sheets. Which accounts do you think would require adjustments at the year-end? Explain.

Cordelion Communications is considering issuing new outstanding debt. The stock issue would have no effect on t tax rate. Which of the following is likely to occur if the comp a . The times interest...

Using the information provided in Part 1 of Problem 1 A, journalize each of the transactions assuming a periodic inventory system. In Problem Part 1 Prepare General Journal entries to record the...