Question: Can someone explain me this question 2 clearly in step by step process Question 2 (2 + 2 + 2 + 2 = 8 marks)
Can someone explain me this question 2 clearly in step by step process

Question 2 (2 + 2 + 2 + 2 = 8 marks) Tweets have started to appear from unknown sources, using an alien language. The three most recent tweets are: . do da da da do . di di di do do . da da da da da da (a) Should we perform stop word removal and/or stemming on these three tweets? (b) Construct the document term frequency matrix. (c) Construct the cosine similarity score of each document to the query "da di" by using term frequencies. (d) Which tweet is more similar to the query? Justify your answer. Answer: (a) Stop words and stemming is specific to the language. Since we are unfamiliar with the alien language, we should not remove stop words or stem. 2 (b) do da di LIXS 20 D1 2 3 0 5 16 16 D2 2 3 D3 0 6 0 6 2 9 9 31 16 (c) d,q = 3, d,q = 3, d,q = 6, Il d, 1/= 13, 1/ d2 1/= 13, 1/ d, 1/=6, 1/91= v2, 2 s(d,, q) = 3(13 2) =0.588, s(d2.9) = (132) =0.588, s( d,, 9) = = 0.707 2 (d) Document 3 is more similar to the query. Because cosine similarity score is the highest. CHECK ME - 8 marks Question 3 (2 + 2 + 2 + 2 = 8 marks) The following graph shows the relationships between a set of YouTube clips
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
