Question: 2. (3] 1 point possible (graded, results hidden) Learning a new representation for examples (hidden layer activations) is always harder than learning the linear classifier

representation for examples (hidden layer activations) is always harder than learning the

2. (3] 1 point possible (graded, results hidden) Learning a new representation for examples (hidden layer activations) is always harder than learning the linear classifier operating on that representation. In neural networks. the representation is learned together with the end classifier using stochastic gradient descent. We initialize the output layer weights as W1 = W; = 1 and Wu = 1. Assume that all the weights are initialized as previously: n 15 to as Z: on so: to -13 .39 a .qa IS do 05 00 03 [0 IS 2n . 3| What is the class label +1 or 1 that the network would predict in response to (5,2) (user 6, item 2]? You have used 0 of 3 attempts Save Problem 2 El Bookmark this page Final due May 15, 2023 07:59 EDT We are given a recommender problem with in. users a. E {1, . .. ,} and m itemsi E {1,. . . ,m}. We will use the labels {#1, 1} to represent the target rating (dislikes,likes). Each user is likely to provide feedback for only a small subset of possible items, and hence we must constrain the models so as not to overfit. Our goal in this problem is to understand how a simple neural network model applies to this problem, and what the constraints of the model are. users Schematic representation of the simple neural network model Input Units Consider the simple neural network with 1 hidden layer depicted in the figure above. We use an input unit for each user [the nodes in the left column) and for each item (the nodes in the top row); so in total, there are n + m input units. When making prediction for a selected entry (0., i), only the nth user input unit and ith item input unit are active (Le. set to the value 1); all other inputs are set to I} and will not affect the predictions. In other words, only the outgoing weights from these two units matter for predicting the label (1 or 1) for entry (:1, 3'). Hidden Units User a has two outgoing weights, an and U52 , and item 17 has two outgoing weights, \"1 and V1: . These weights are fed as inputs to the two hidden units in the model. The hidden units evaluate 21 = rind-W1: J'CJIJ = M{U,2:1} 2: : Uaa+Waa J'U'a] max{0, 23}. Output Thus, for the (an?) entry, our network outputs F(,'i;9) : W1f(31)+W:f(53)+Wg where 6 denotes all the weights U, V, and W. Finally, a sign function is applied to F (a, i; 9) for the classification. 12" In vector notation, each user a has a twodimensional vector of outgoing weights u a = [Ual, U52 , and , , , - , T , Similarly each item i has a twodimensional vector of outgoing weights 7: = ['41, Via] . The input received by the hidden units is represented as the vector l- .I' =1\"; 13;. I- :1" z = [zl'i $1] + In the problems below, we will consider a simple version of this problem, which has only two users, {11,13}, and two items {1, 2}. So the recommendation problem can be represented as a 2 X 2 matrix. We will initialize the first layer weights as shown in figure below. U1 Ua -0.5 -1.0 -1.5 -2020 -1.5 -1.0 -0.5 0.0 0.5 1.0 2. (1) 4 points possible (graded, results hidden) Which user-item pair is mapped to the points A, B, C, D in the (f (z1) , f (z2 )) -plane below? In other words, for each feature representation If (z1 ) , f (z2)] , find the corresponding user-item pair (e.g. (a, 1) is the pair user a and item 1) that is mapped to it using the input-to-hidden layer weights. B f(z2) -1.0 -1.5 -2020 -1.5 -1.0 -05 0.0 05 1.0 1.5 2.0 f(z1) (Choose one for each column below.) D : A : B : C (a, 1) C (a, 1) C ( a, 1) C (a, 1) C (a, 2) C (a, 2) (a, 2) (a, 2) C (b, 1) C (6, 1) C (b, 1) C (b, 1) C (b, 2) C (b, 2) C (b, 2) C (b, 2)2. (2) 1 point possible (graded, results hidden) Recall the initial values of the hidden layer weights are as in the figure below. 20 1.5 U1 It\" 10 (H 22 M 5'2 ens do ab -13 "'93.: 7\" do "as 00 05 to Is 2n Suppose we keep the input to the hidden layer weights (U 's and V 's] at their initial values shown above, and only estimate the weights W corresponding to the output layer. Different choices of the output layer weights will result in different predicted 2 X 2 matrices of {#1, 1} labels. Which of the following matrices is one that the neural network cannot reproduce with any choice of the output layer weights W1, W3, and Wu? 1 2 Both of the above none of the above 2. (4) 1 point possible (graded, results hidden) Assume that we observe the opposite label from your answer to the previous question. In other words, there is a training signal at the network output. In 15 \"1 \"a 10 05 22 0,, 172 05 40 ab '15 720)0 l -ID -! (ID 01 I0 15 In In All the weights are initialized as previously, i.e. W1 = W: = 1 and We = 1 and the Us and V's are given by the figure above. Which of the weights, depicted in blue in the schematic diagram below, would change (have nonzero update) based on a single stochastic gradient descent step in response to (b, 2) with our specific weight initialization and the target label? Note that the input units a, b and 1, 2 are activated with 0's and is as shown inside the circles. You are not asked whether Wu would change. (Choose all that apply.) Ural

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!

Algorithms in Artificial Intelligence (or, the old name: Introduction to Algorithmic Decision Making) Part 1 Based on slides by David Sarne and Lirong Xia Course Tentative Schedule Introduction...

Jupyter Notebook Now that we have tried our hand at some single-layer nets, let's see how they stack up compared to multi-layer nets. :) We will be exploring the basic concepts of learning non-linear...

Please read the attached article and write a reflection/ a brief summary of the article including: - What is the purpose of this document? - What are the key arguments and supporting evidence?...

Jupiter Notebook We have covered some of the limitations of single layer neural networks in class, but they are still powerful learning systems that provide a good way to begin learning about how to...

Please provide the summary of the methodology and your understanding of this paper. Incluse necessary figures as well. Rapid Object Detection using a Boosted Cascade of Simple Features single feature...

Developments in Technology Light is incident from air on the end face of a multimode optical fibre at angle of incidence as shown below. n n 1 2 The refractive indices of the core and cladding are...

Python and most Python libraries are free to download or use, though many users use Python through a paid service. Paid services help IT organizations manage the risks associated with the use of...

MATHEMATICS FOR MACHINE LEARNING Marc Peter Deisenroth A. Aldo Faisal Cheng Soon Ong Contents Foreword 1 Part I Mathematical Foundations 9 1 Introduction and Motivation 11 1.1 Finding Words for...

A discrete sequence {xn} can be converted into a continuous representation x(t) = ts X n= (t n ts) xn, where ts is the sampling period. (a) State two characteristic properties of Dirac's function. [2...

The new line character is utilized solely as the last person in each message. On association with the server, a client can possibly (I) question the situation with a client by sending the client's...

What are some common ways you can search for new business opportunities?

9. The probabilities of four teams, E, F, G and H, winning a hockey competition are 111 and 3'5'7 11 respectively. Only one team can win the competition. (i) Explain why there are more than four...

Simpson, age 45, is a single individual who is employed full time by Duff Corporation. This year Simpson reports AGI of $73,400 and has incurred the following medical expenses: Dentist charges $...