Question: Problem 6 (11 points). Term Frequency - Inverse Document Frequency (TF-IDF) is one of the most popular term-weighting schemes today, and 83% of text-based recommender


Problem 6 (11 points). Term Frequency - Inverse Document Frequency (TF-IDF) is one of the most popular term-weighting schemes today, and 83% of text-based recommender systems in digital libraries use TF-IDF. Term Frequency (TF) is denoted as the number of times a word w appears in a document d divided by the total number of words (i.e., Nd) in the document d. Every document has its own term frequency. nud tfw.d = Na where ned is the number of times word w appears in d. Inverse Document Frequency (IDF) is the log of the number of documents divided by the number of documents that contain the word w. Inverse document frequency determines the weight of rare words across all documents in the corpus. idfus = logl df D where D represents the number of documents, and dfw denotes the number of documents containing word w. The TF-IDF value of word w in document d is TF-IDFd=tfwd*idf . had log! = = D ( df Nd Question: use TF-IDF values to represent each document as a vector in Problem 4 (the log base is 3)
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
