Question: Recall that the skip-gram with negative sampling attempts to predict if pairs of words occur within the same context. In this problem, we'll show that

Recall that the skip-gram with negative sampling attempts to predict if pairs of words occur within the same context. In this problem, we'll show that (under certain assumptions) this is an implicit matrix factorization. To simplify the math, we'll work with the special case when we draw one negative sample per positive (word, context) tuple. We'll use the following notations: T is the length of the corpus, V is the vocabulary uw, vc R d are the center and context word vectors for w and c, for all w, c V Count(w, c) denotes the number of occurrences of c in the context of w, Count(w) for any word w V is the number of occurrences of w in the corpus. Suppose we draw the negative sample according to the empirical unigram distribution, i.e, the probability of sampling a word cN is P(cN ) = Count(cN )/T. Our loss function for a single (w, c) pair for one occurrence of this pair is

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock