Question: I am using RStudio to run an ANOVA test on a set of data with 14 columns and 89350 rows. I am assessing the variation

I am using RStudio to run an ANOVA test on a set of data with 14 columns and 89350 rows. I am assessing the variation between WEATHER and FATALS (Fatalities in traffic accidents) where FATALS is the dependent value, ie

First I use the function lm() to build a frequency table, then a proportions table as shown here:

weather.lm <- lm(formula = dat$FATALS ~ dat$WEATHER, data = dat)

Second, I use anova() to analyze the variation between the test variable (WEATHER) and the base variable (FATALS). I don't have a problem here.

(a <- anova(dui.mod2, dui.mod3))

Lastly, I use predict() to predict values of FATALS given values of WEATHER.

weather.new <- data.frame(WEATHER = c(1, 2, 3)) # dataframe of new WEATHER data

# Predict the value of the new FATALS using

wet <- predict(object = weather.lm,# The weather.lm regression model

newdata = weather.new)# dataframe of new data

My question is I am giving predict() only 3 values of WEATHER, so I am expecting only 3 values of FATALS. Instead, the function runs across all 89350 rows of my main table. And I notice all the predicted values are the same 89350 times for a given value of WEATHER.

How do I predict 1 for 1, ie I give 1 test value and I get 1 prediction? Or I give 2 and I get 2, and so on?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!