Question: Part 3 : Writing your own regression function ( 3 0 pts total ) For this question, you will write and test your own substitute

Part 3: Writing your own regression function (30pts total)
For this question, you will write and test your own substitute for the ) function. Your
function will take in a dataframe with columns labeled y and x and fit the regression of the
form:
yi=0+1xi+i
Your function should return a list() object with named components:
data, which is a dataframe that contains the original data used to estimate the regression.
params, which is a dataframe with the estimates widehat()0,widehat()1 and their corresponding standard
errors hat()0,hat()1, which can be computed as follows: hat()1=i?(xi-(x))(yi-(?bar(y)))i?(xi-(x))2
hat()0?b=ar(y)-hat()1x
hat()1=hat()1i?(xi-(x))22
hat()0=hat()1n+x2i?(xi-(x))22=hat()1i?xi2n2
where n is the number of observations.
pred, which is a vector of the predicted y values, hat(y)i, where
hat(y)i=hat()0+hat()1xi
resid, which is a vector of residuals:
ei=yi-hat(y)i
mse which is the estimate of the error variance:
hat()2=1n-2i?ei2
rsq which is the R2, or coefficient of determination, which is the proportion of variance
explained by the regression:
R2=1-i?ei2i?(yi-(?bar(y)))2
plot which is a ggplot object that contains a scatterplot of the data with a line of
best fit superimposed. To add the line, use geom_abline() and your estimates of hat()0,hat()1.
Make sure the line is a fun color!
I suggest beginning by writing individual functions for each of these tasks and testing these
functions repeatedly against the actual output of ) to be sure you're getting similar results.
A good way to simulate univariate linear regression data to compare to is:
N -100
d_reg - data.frame(x = rnorm(N))
d_regy-3.14+2.72*dregx + rnorm(N)
m - lm(y x, d_reg)
summary(m) Call:
lm(formula = y ~ x, data = d_reg)
Residuals:
Min rrrrr
Coefficients:
Estimate Std. Error t value Pr(>|t|)Signif. codes: 0''0.001''0.01''0.05'.'0.1''1
Residual standard error: 0.9838 on 98 degrees of freedom
Multiple R-squared: 0.8649, Adjusted R-squared: 0.8635
F-statistic: 627.3 on 1 and 98 DF, p-value: 2.2e-16
Note that you can vary the coefficients, and the means/variances/distributions used to generate
x and the noise in y.
Your function should:
Take only a dataframe with columns labeled x and y
Output a list object with named entities
Reproduce the corresponding output of the ) function exactly
Write your function, unpack the contents of the list() object you produce, and show that
they produce a reasonable plot and reproduce the results from ) for some simulated data
(it need not be the data I produced above).
Hint: If you can't find something you need in the summary(), read the documentation for
lm()!
# TODO: write your function, fit it to some simulated data, demonstrate that
# it reproduces the output from ) by unpacking the contents of the model
# object you build and comparing to the results from lm()
MyLM - function(d){
}
Part 3 : Writing your own regression function ( 3

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!