Question: ( d ) Suppose max pooling is applied on an 8 8 image with a 2 2 filter and stride 2 pixels. What will be

(d) Suppose max pooling is applied on an 88 image with a 22 filter
and stride 2 pixels. What will be the number of parameters in this
layer?
(1 mark)
(e) Consider the following plot of the number of stochastic gradient
descent (SGD) iterations required to reach a given loss, as a function
of the batch size:
For small batch sizes, the number of iterations required to reach the target
loss decreases as the batch size increases. Why is that?
(2 marks)
(f) Write down the number of parameters in each field. Assume the
convolution filter is of shape 3364, what would be the values in
the fields II, III, and V?
(3 marks)(g) You are given a black box optimizer which produces the loss curve
shown in Figure A. You see a big red button on the optimizer and
decide to push it. After doing this, you notice the loss curve shown in
Figure B. You press the button one more time and finally notice the
loss curve shown in Figure C.
1gure L
The red button modifies a single hyperparameter. Which hyperparameter is
most likely to be modified by pressing the button?
(1 mark)
Also, of experiments 1,2 and 3, which corresponds to largest magnitude of
the hyperparameter?
(1 mark)
Lastly, the loss curve for experiment 3 seems to be the most desirable.
Despite this, give two reasons why you would choose the hyperparameter
in experiment 2 for training your model.
(2 marks)Neural networks.
(a) Let us say you have a training set s containing m pairs (,yi) where
vector x is to be assigned to one of K classes in a supervised setting
and the labels yi are the vectors in {0,1}K containing a single 1
representing the target class, i.e., if there are 5 classes and some
should be assigned to class 2 then yi=(0,1,0,0,0). To do this, it is
proposed that you use K neural networks. The ith network has
parameters wi and computes the function wi,x. You may make no
further assumptions regarding the function h.
You aim to treat the output of the i th network as an estimate of the
probability class i|x,w that x should be in the i th class,
where w collects together all the K vectors wl,dots,wK. It is
proposed that to do this you should modify the setup described to
compute
P(n class i|x,w)=prob(i,x)
=exph(wi,x)j=1Kexp(h(wi,x))
Explain why this modification is required, and how it achieves the
stated aim?
(4 marks)
(b) Suppose a convolution layer takes a 32323 input volume, and
applies ten 55 filters with stride 1 pixel and padding 2 pixels. What
will be the size of the output volume?
(2 marks)
(c) Given the graphs of testing and training error, do you think an
evident problem here is overfitting? Yes or no, please justify your
answer!
(4 marks)
 (d) Suppose max pooling is applied on an 88 image with

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!