Question: Recall that the skip-gram with negative sampling attempts to predict if pairs of words occur within the same context. In this problem, we'll show that
Recall that the skip-gram with negative sampling attempts to predict if pairs of words occur within the same context. In this problem, we'll show that (under certain assumptions) this is an implicit matrix factorization. To simplify the math, we'll work with the special case when we draw one negative sample per positive (word, context) tuple. We'll use the following notations: T is the length of the corpus, V is the vocabulary uw, vc R d are the center and context word vectors for w and c, for all w, c V Count(w, c) denotes the number of occurrences of c in the context of w, Count(w) for any word w V is the number of occurrences of w in the corpus. Suppose we draw the negative sample according to the empirical unigram distribution, i.e, the probability of sampling a word cN is P(cN ) = Count(cN )/T. Our loss function for a single (w, c) pair for one occurrence of this pair is
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
