Question: I am using RStudio to run an ANOVA test on a set of data with 14 columns and 89350 rows. I am assessing the variation
I am using RStudio to run an ANOVA test on a set of data with 14 columns and 89350 rows. I am assessing the variation between WEATHER and FATALS (Fatalities in traffic accidents) where FATALS is the dependent value, ie
First I use the function lm() to build a frequency table, then a proportions table as shown here:
weather.lm <- lm(formula = dat$FATALS ~ dat$WEATHER, data = dat)
Second, I use anova() to analyze the variation between the test variable (WEATHER) and the base variable (FATALS). I don't have a problem here.
(a <- anova(dui.mod2, dui.mod3))
Lastly, I use predict() to predict values of FATALS given values of WEATHER.
weather.new <- data.frame(WEATHER = c(1, 2, 3)) # dataframe of new WEATHER data
# Predict the value of the new FATALS using
wet <- predict(object = weather.lm,# The weather.lm regression model
newdata = weather.new)# dataframe of new data
My question is I am giving predict() only 3 values of WEATHER, so I am expecting only 3 values of FATALS. Instead, the function runs across all 89350 rows of my main table. And I notice all the predicted values are the same 89350 times for a given value of WEATHER.
How do I predict 1 for 1, ie I give 1 test value and I get 1 prediction? Or I give 2 and I get 2, and so on?
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
