Question: 4 . 2 If we keep the hidden layer parameters above fixed but add and train additional hidden layers ( applied after this layer )

4.2
If we keep the hidden layer parameters above fixed but add and train additional hidden layers (applied after
this layer) to further transform the data, could the resulting neural network solve this classification problem?
yes
no
Suppose we stick to the 2-layer architecture but add many more ReLU hidden units, all of them without offset
parameters. Would it be possible to train such a model to perfectly separate these points?
Note : Assume that no 2 data points lie on the same line through the origin.
yes
no Which of the following statements is correct?
The gradient calculated in the backpropagation algorithm consists of the partial derivatives of the loss
function with respect to each network weight.
True
False
Initialization of the parameters is often important when training large feed-forward neural networks.
If weights in a neural network with sigmoid units are initialized to close to zero values, then during early
stochastic gradient descent steps, the network represents a nearly linear function of the inputs.
True
False
On the other hand, if we randomly set all the weights to very large values, or don't scale them properly
with the number of units in the layer below, then the sigmoid units would behave like sign units. Here,
"behave like sign units" allows for shifting or rescaling of the sign function.
(Note that a sign unit is a unit with activation function sign(x)=1 if x>0 and sign(x)=-1 if
x0. For the purpose of this question, it does not matter what sign(0) is.)
True
False
If we use only sign units in a feedforward neural network, then the stochastic gradient descent update
will
almost never change any of the weights
change the weights by large amounts at random
Stochastic gradient descent differs from (true) gradient descent by updating only one network weight
during each gradient descent step.
True
False There are many good reasons to use convolutional layers in CNNs as opposed to replacing them with fully
connected layers. Please check T or F for each statement.
Since we apply the same convolutional filter throughout the image, we can learn to recognize the same feature
wherever it appears.
True
False
A fully connected layer for an image has more parameters than the size of image.
True
False
A fully connected layer can learn to recognize features anywhere in the image even if the features appeared
preferentially in one location during training
True
Falsedefined in the figure below. Note that hidden units have no offset parameters in this problem.
f(z1)v1+f(z2)v2+v0
z1=x1w11+x2w21
z2=x1w12+x2w22
f(zj)=max{0,zj}The values of the weights in the hidden layer are set such that they result in the z1 and z2 "classifiers" as
shown in the (x1,x2)-space in the figure below:
The z1 "classifier" with the normal w1=[w11w21]T is the line given by z1=x*w1=0.
Similarly, the z2 "classifier" with the normal w2=[w12w22]T is the line given by z2=x*w2=0.
The arrows labeled w1 and w2 point in the positive directions of the respective normal vectors.
The regions labeled I, II, III, IV are the 4 regions defined by these two lines not including the
boundaries.
Choose the region(s) in (x1,x2) space which are mapped into each of the following regions in (f1,f2)-
space, the 2-dimensional space of hidden unit activations f(z1) and f(z2).(For example, for the second
column below, choose the region(s) in (x1,x2) space which are mapped into the f1-axis in (f1,f2)-space.)
(Choose all that apply for each column.)
4 . 2 If we keep the hidden layer parameters

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!