Question: This question and the next one use the following context. Consider a modified version of the one - layer Deep Averaging Network ( DAN )

This question and the next one use the following context. Consider a modified version of the one-layer Deep Averaging Network (DAN) with the following architecture:
Input: Sequence of word embeddings 21,22,..., In, each of dimension 50.
PyTorch layers: (1) Linear layer (input=50, output=50): (2) ReLU; (3) Averaging layer; (4) Linear layer (input=50, output=50); (5) ReLU; (6) Linear layer (input=50, output=2); (7) Softmax
Given the network as is, what will be the biggest reason that it may fail to learn a task compared to a basic DAN?
A. The softmax cannot "peak" enough on the right answer (the logits are too small)
B. It doesn't correctly implement a nonlinear computation
C. There are too many linear layers, leading to too many parameters
D. None of the above

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!