Question: Question in computer scienceee Question No: 0 2 This is a subjective question, hence you have to write your answer in the Text - Fleld

Question in computer scienceee
Question No: 02
This is a subjective question, hence you have to write your answer in the Text-Fleld given below.
Consider a combined Lenet-5 and a single-layer RNN based visual captioning system that is trained to generate the sequence of small characters corresponding to ten digits, 0 to 9. For example, an input image of '7' will generate the output character sequences s,e,v,e, n. An input image that is not of a digit will generate the output n, o, n, e. Assume one hot representation is used for both input and output.
(a) What is the minimum number of input nodes and minimum number of output node required in RNN?
(b) Assuming linear combinations of the output of last convolution layer (after subsampling and unrolling) is used to initialize the RNN hidden layer, how many trainable parameters will be needed, excluding the CNN convolution parameters? Assume 50 hidden nodes are used in RNN. Show all steps clearly. No attention is used.
(c) Over how many time steps, does the loss function has to be evaluated during training? [1]
Question in computer scienceee Question No: 0 2

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!