Question: Online Code Test || Only 30 minutes remaining Given a corpus C of documents (as a list of strings), a word token and a document

Online Code Test || Only 30 minutes remaining

Given a corpus C of documents (as a list of strings), a word token and a document index, find the term frequency - inverse document frequency (tfidf) of the token in the document relative to the corpus. A document can be considered to be a sequence of tokens separated by a space. We will assume the following definitions: term frequency (tf) of token tt in a document: the number of times the token appears in a given document inverse document frequency (idf) of token tt: 1+log2(C1+nt)1+log2(1+ntC)where CC is the size of the corpus (i.e. the number of documents in C), ntnt is the total number of documents that contain the token tt and log2log2 is the logarithm to the base 2

Finally, tfidf = tf * idf (i.e. a product of tf and idf).

For the purposes of computation, the case of the token in the document should be ignored (e.g.The, THE and the should be treated as the same token).

  • [execution time limit] 4 seconds (py)

  • [input] array.string corpus

    List of documents in the corpus

  • [input] integer doc_idx

    index (0 based) of the document in the corpus

  • [input] string token

    input token for computing tfidf

  • [output] float

    tfidf value

[Python 2] Syntax Tips

# Prints help message to the console # Returns a string def helloWorld(name): print "This prints to the console when you Run Tests" return "Hello, " + name

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!