Question: This question and the next one use the following context. Consider a modified version of the one - layer Deep Averaging Network ( DAN )
This question and the next one use the following context. Consider a modified version of the onelayer Deep Averaging Network DAN with the following architecture:
Input: Sequence of word embeddings In each of dimension
PyTorch layers: Linear layer input output: ReLU; Averaging layer; Linear layer input output; ReLU; Linear layer input output; Softmax
Given the network as is what will be the biggest reason that it may fail to learn a task compared to a basic DAN?
A The softmax cannot "peak" enough on the right answer the logits are too small
B It doesn't correctly implement a nonlinear computation
C There are too many linear layers, leading to too many parameters
D None of the above
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
