Question: In this assignment, use the HBAT_200 data file and build a regression to predict Recommend (willingness to recommend HBAT) via ratings of HBAT's TechSup, Advertising,
In this assignment, use the HBAT_200 data file and build a regression to predict Recommend (willingness to recommend HBAT) via ratings of HBAT's TechSup, Advertising, Sales, Warranty, Billing, Delivery and dummy coded variables representing their Industry and Distribution using the mixed stepwise and best subsets approach.
HBAT_200.xls data file (copy and pasteable to .R format):
https://docs.google.com/spreadsheets/d/1E0JuWnv7dsU2HxGCdFnp4LpoVVjVO0fM/edit?usp=sharing&ouid=107429429411018927337&rtpof=true&sd=true
Modify the following code (copy the text below into a new script file in R Studio, then edit it) to replace the variables with the variables mentioned in the last paragraph.
#Multiple Regression
#Install packages if needed
install.packages("ppcor")
install.packages("caret")
install.packages("lm.beta")
install.packages("leaps")
install.packages("dplyr")
install.packages("car")
install.packages("carData")
install.packages("fastDummies")
install.packages("ggplot2")
install.packages("ggplotgui")
install.packages("MVN")
#load packages if needed
library(ppcor)
library(caret)
library(lm.beta)
library(leaps)
library(dplyr)
library(car)
library(carData)
library(fastDummies)
library(ggplot2)
library(ggplotgui)
library(MVN)
#
#Import data----
HBAT_200 <- data.frame(readRDS("Module 5/HBAT_200.RDS"))
HBAT_200 <- dummy_cols(HBAT_200)
names(HBAT_200)
#subset data for graphing
HBAT_200b <- subset.data.frame(
HBAT_200, select=c("Satisfaction", "Etail","CompResolve","Products","Pricing","NewProd","PriceFlex","Customer_btwn_1_5yr","Customer_Over5yr", "Size_Large"))
#
#ggplotgui for easy histograms and scatterplots
ggplot_shiny(HBAT_200b) #histograms of the DV and IVs and scatterplots of DV against each IV
#Check for multivariate outliers
mvnObj <- mvn(
data= HBAT_200b[ ,2:7], #THESE NUMBERS ARE THE COLUMNS FOR THE IVs IN HBAT_200b, ADJUST AS NEEDED
mvnTest="royston",
univariateTest = "Lillie",
multivariatePlot = "qq",
multivariateOutlierMethod = "adj",
showOutliers = TRUE,
showNewData = TRUE)
#from the graph produced by the above, are there outliers? If so, the next line will tell you which obs they are
mvnObj[["multivariateOutliers"]][["Observation"]]
#put the row numbers between the parentheses in the following to delete the outliers and save to a new data frame--no quotes
HBAT_200c <- HBAT_200b[-c(),]
#MULTIPLE REGRESSION ALL IVs
regAll <- lm(Satisfaction ~ Etail + CompResolve + Products + Pricing + NewProd+ PriceFlex + Customer_btwn_1_5yr + Customer_Over5yr +Size_Large, HBAT_200c)
summary(regAll)
vif(regAll)
plot(regAll)
#MIXED STEP-WISE REGRESSION
#define intercept-only model
intercept_only <- lm(Satisfaction ~ 1, data = HBAT_200c)
#define model with all predictors
all <- lm(Satisfaction ~ Etail + CompResolve + Products + Pricing + NewProd+ PriceFlex + Customer_btwn_1_5yr + Customer_Over5yr + Size_Large, data=HBAT_200c)
#perform mixed stepwise regression
both <- step(intercept_only, direction='both', scope=formula(all), trace=0)
#view results of mixed stepwise regression
both$anova
#view final model
both$coefficients
#BEST SUBSETS REGRESSION
Best_HBATreg <- regsubsets(Satisfaction ~ Etail + CompResolve + Products + Pricing + NewProd+ PriceFlex + Customer_btwn_1_5yr + Customer_Over5yr + Size_Large,
data =HBAT_200c,
nbest = 1, # 1 best model for each number of predictors
nvmax = NULL, # NULL for no limit on number of variables
force.in = NULL, force.out = NULL,
method = "exhaustive")
summary(Best_HBATreg)
summary_best_subset <- summary(Best_HBATreg)
as.data.frame(summary_best_subset$outmat)
summary_best_subset$adjr2
summary_best_subset$cp
which.max(summary_best_subset$adjr2)
which.min(summary_best_subset$cp)
# choose the model balancing the desire for the largest adj r-sq, the lowest CP, and the fewest IVs (parsimony)
summary_best_subset$which[7,]
finalHBATreg <- lm(Satisfaction ~ Etail + Products + Pricing + NewProd + Customer_btwn_1_5yr+ Customer_Over5yr + Size_Large, HBAT_200c)
summary(finalHBATreg)
lm.beta(finalHBATreg)
vif(finalHBATreg)
Question 1
Looking at the scatterplots of the DV against each IV, do you have any concerns about violations of the assumptions of regression? If so, what, and what would you recommend to address them?
_____________________________________________________________________________________________________________________
Question 2
Examining the multivariate distribution of the IVs, how many outliers are there? Delete them before running your models (instructions in the code)
_____________________________________________________________________________________________________________________
Question 3
Run the model with all the IVs entered. Is the model significant overall? What does that mean?
How much of the variability in Recommend is explained by the model? Which IVs are significant
at the 95% level? Do the VIFs indicate issues with multicollinearity? Which independent variable
has the most impact on Recommend? Any concerns looking at the graphs of residuals vs
predicted values? Any large residuals or influential observations to be concerned with?
_____________________________________________________________________________________________________________________
Question 4
What is the final model using the mixed stepwise approach? How much of the variability in
Recommend is explained by the model? Which IVs are kept in the model?
Question 5
What is the final model using the best subset approach? How much of the variability in
Recommend is explained by the model? Which IVs are kept in the model?
_____________________________________________________________________________________________________________________
Question 6
How do the 2 models compare to each other, the mixed stepwise and the best subsets? If they
differ, do you prefer one over the other, and why?
_____________________________________________________________________________________________________________________
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
