Question: Assume you are training a deep neural network using stochastic gradient descent (a) Sketch a typical curve which shows the training loss as a function

Assume you are training a deep neural network using stochastic gradient descent (a) Sketch a typical curve which shows the training loss as a function of training steps, for the following cases: too small of a learning rate, to large of a learning rate, optimal learning rate. (b) Explain two reasons the training cost might go up when performing stochastic gradient descent, and what can be done to partly alleviate these problems. (c) Explain the difference between batch, mini-batch and stochastic gradient descent. Why do we then to usually use mini-batch? (d) One of the problems with backpropagation is the vanishing gradient when networks get very deep. Explain how GoogLeNet and Resnet attempt to alleviate this problem.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Forward propagation is simply the summation of the previous layer's output multiplied by the weight of each wire, while back-propagation works by computing the partial derivatives of the cost...

( d ) Suppose max pooling is applied on an 8 8 image with a 2 2 filter and stride 2 pixels. What will be the number of parameters in this layer? ( 1 mark ) ( e ) Consider the following plot of the...

Use TensorFlow with Python. The standard example for machine learning these days is the MNIST data set, a collection of 70,000 handwriting samples of the numbers 0-9. Your task is to predict which...

1 . Load the image dataset: load the Digits dataset from a predefined path in the MATLAB toolbox. The imageDatastore function handles the loading of image data. All subfolders are included and folder...

Journal of Open Innovation: Technology, Market, and Complexity MDPI Article Emerging Technology and Business Model Innovation: The Case of Artificial Intelligence Jaehun Lee 1.", Taewon Suh , Daniel...

Please help me solve it in jupyter notebook. Code below is for Question 1. Thanks~ class ConvNet(torch.nn.Module): def __init__(self): super(ConvNet, self).__init__()...

Jupiter Notebook We have covered some of the limitations of single layer neural networks in class, but they are still powerful learning systems that provide a good way to begin learning about how to...

Question 1 Which of the following is a potential drawback of using neural networks? O a) They are computationally efficient for all tasks. O b) They often require a large amount of labeled training...

4 . 2 If we keep the hidden layer parameters above fixed but add and train additional hidden layers ( applied after this layer ) to further transform the data, could the resulting neural network...

Suppose that three switches are used to connect 10 hosts (h_0, h_1, ..., h_9). Assume switch s_1 is connected to h_0,h_1,h_2,h_3, and switch s_2 (using interfaces 1 to 5 respectively); s_2 is...

Compare the minimum times required to isothermally anneal the following steels at 600 C. Discuss the effect of the carbon content of the steel on the kinetics of nucleation and growth during the heat...

When selecting financing options for energy projects, there are several key elements that you should consider to achieve a balance between financial returns and sustainability outcomes. Which of the...

Seved Help 14 Wisconsin Snowmobile Corp. is considering a switch to level production Cost efficiencies would occur under level production, and aftertax costs would decline by $31,500, but inventory...

After Column and Data Types have been set in the Mining Structure, what is the next step?

What needs to occur after any Structural or Variable setting change in SSAS Mining Structures?

For the highest GS Pay Grade Group (1115) in the Federal Government, what are the chances of Females being included? Do Females have longer service statistics in that Group?