Question: 1. In the lecture, we wrote an R function to apply the batch gradient descent algorithm to fit a linear regression model describing the relationship

the relationship between the variables dist and speed in the cars data.

1. In the lecture, we wrote an R function to apply the batch gradient descent algorithm to fit a linear regression model describing the relationship between the variables dist and speed in the cars data. Modify that function to make it implement the stochastic gradient descent algorithm to solve the same probem. Compare the results and computing time for the two algorithms (batch versus stochastic gradient descent). To ensure a fair comparison, make sure to use the same inputs for both algorithms (initial values, learning rate, convergence threshhold, and maximum number of iterations). For these paramters, use the same inputs we used in the lecture. In your comparison, include the following: a. How close are the estimated coefficients from each method to the coefficients obtained from the built-in Im function. b. Which algorithm takes longer to converge? (Hint: to monitor the time each algorithm takes, use the command system.time when you apply each of the two algorithms to the data). If you name the two functions gd. Ireg and sgd.Ireg , you can make Rreport the time each function takes to run on the data using the following code: system.time({gd. Ireg( speed, dist, 0.001, le-10 1000000)}) system.time({sgd. lreg(speed, dist, 0.001, le-1 0, 1000000)}) 7:28 Search optimization.html gd.lreg > +ignoring the constant 2 bo = b0_new bl-bi_new yhat - bO + 51 x MSE_new - sum((y-yhat)2) if (abs (MSE_new-MSE) iter = iter + 1 if(iter > max. iter) { converged - TRUE return(cat("Intercept at last iter:", bo In' "Slope at last iter:", bi, 'n "MSE at last iter values:", M SE new) ) } > ) Run the function on the cars data gd. lreg(speed, dist, 0.001, le-10, 1000000) mod Im(dist-speed, data=cars) summary (mod) 30 ## ## Call: ## 1m (formula = dist - speed, data = cars) ## ## Residuals: ## Min 19 Median Max ## -29.069 -9.525 -2.272 9.215 43.201 ## ## Coefficients: ## Estimate Std. Error t value Pri >t) ## (Intercept) -17.5791 6.7584 -2.601 0 .0123 * ## speed 3.9324 0.4155 9.464 1.4 9e-12 *** ## ## Signif. codes: O ****' 0.001 '**' 0.01 0.05 '.' 0.1 'i ## ## Residual standard error: 15.38 on 48 degree s of freedom ## Multiple R-squared: 0.6511, Adjusted R-squ ared: 0.6438 ## F-statistic: 89.57 on 1 and 48 DF, p-value : 1.49e-12 1. In the lecture, we wrote an R function to apply the batch gradient descent algorithm to fit a linear regression model describing the relationship between the variables dist and speed in the cars data. Modify that function to make it implement the stochastic gradient descent algorithm to solve the same probem. Compare the results and computing time for the two algorithms (batch versus stochastic gradient descent). To ensure a fair comparison, make sure to use the same inputs for both algorithms (initial values, learning rate, convergence threshhold, and maximum number of iterations). For these paramters, use the same inputs we used in the lecture. In your comparison, include the following: a. How close are the estimated coefficients from each method to the coefficients obtained from the built-in Im function. b. Which algorithm takes longer to converge? (Hint: to monitor the time each algorithm takes, use the command system.time when you apply each of the two algorithms to the data). If you name the two functions gd. Ireg and sgd.Ireg , you can make Rreport the time each function takes to run on the data using the following code: system.time({gd. Ireg( speed, dist, 0.001, le-10 1000000)}) system.time({sgd. lreg(speed, dist, 0.001, le-1 0, 1000000)}) 7:28 Search optimization.html gd.lreg > +ignoring the constant 2 bo = b0_new bl-bi_new yhat - bO + 51 x MSE_new - sum((y-yhat)2) if (abs (MSE_new-MSE) iter = iter + 1 if(iter > max. iter) { converged - TRUE return(cat("Intercept at last iter:", bo In' "Slope at last iter:", bi, 'n "MSE at last iter values:", M SE new) ) } > ) Run the function on the cars data gd. lreg(speed, dist, 0.001, le-10, 1000000) mod Im(dist-speed, data=cars) summary (mod) 30 ## ## Call: ## 1m (formula = dist - speed, data = cars) ## ## Residuals: ## Min 19 Median Max ## -29.069 -9.525 -2.272 9.215 43.201 ## ## Coefficients: ## Estimate Std. Error t value Pri >t) ## (Intercept) -17.5791 6.7584 -2.601 0 .0123 * ## speed 3.9324 0.4155 9.464 1.4 9e-12 *** ## ## Signif. codes: O ****' 0.001 '**' 0.01 0.05 '.' 0.1 'i ## ## Residual standard error: 15.38 on 48 degree s of freedom ## Multiple R-squared: 0.6511, Adjusted R-squ ared: 0.6438 ## F-statistic: 89.57 on 1 and 48 DF, p-value : 1.49e-12

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

ALL MODIFY CODES MUST BE IN "r" 1. In the lecture, we wrote an R function to apply the batch gradient descent algorithm to fit a linear regression model describing the relationship between the...

Below is the code to be modify. MODIFY THE ABOVE CODE TO MAKE IT IMPLEMENT THE STOCHASTIC GRADIENT DESCENT ALGORITHM SOLVING THE SAME PROBLEM (ALL CODES MUST BE IN "r")? 1. In the lecture, we wrote...

Here is the code to be modify 1. In the lecture, we wrote an R function to apply the batch gradient descent algorithm to fit a linear regression model describing the relationship between the...

In this task, you will need to implement linear regression and get to see it work on data. For the starter, you will n eed to download the starter code and unzip its contents to the directory where...

1 - Packages First, import all the packages that you will need for this assignment. Some of the packages are given below. - numpo is the fundamental package for working with matrices in Python. -...

(a) In SystemVerilog, what is the difference between: (i) The ternary operator ? and if...then...else statements? [2 marks] (ii) always_ff and always_comb? [2 marks] (iii) Blocking, non-blocking and...

7. Apply a proper regression model- either Binary Logistic Regression or Linear Regression. You need to think which one best fit with this dataset. Use the following for the selected model Task 1 1....

Association rule mining is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in...

Code the function greedy_predicator without using numpy/pandas Please include explanation of the code & the computational complexity To see the description of the function: Scroll down the...

dee complete please help Complexity Theory (a) Defifine the set of Boolean expressions 2CNF and the language 2SAT over them. (b) For a Boolean expression in 2CNF, let G() be the directed graph with...

Suppose we use the conjugate gradient method to solve the n x n linear system Ax b where A is symmetric positive definite. Show that each step direction d* is in %3D span{r", r*}, for every k > 0,...

1. You decide to add an additional oven to the process. The number of dozens of cookies that can be produced per hour will increase to: 12 dozen because we doubled the capacity of the baking step....

For companies that use LIFO or retail inventory method, inventory is valued at:

This is figure 3.32 This is figure 3.33 3.12 Including the initial parent process, how many processes are created by the program shown in Figure 3.32? 3.13 Explain the circumstances under which which...

why are critical so connections to learning can be made by students and the adults in the school. Including feedback from the teacher or peers is one way to make this connection.

Was the right data collected in order to discern levels of implementation and next steps?

Do look fors need to be revisited? Is additional learning needed to deepen understanding?