Question: Online Code Test || Only 30 minutes remaining Given a corpus C of documents (as a list of strings), a word token and a document

Online Code Test || Only 30 minutes remaining

Given a corpus C of documents (as a list of strings), a word token and a document index, find the term frequency - inverse document frequency (tfidf) of the token in the document relative to the corpus. A document can be considered to be a sequence of tokens separated by a space. We will assume the following definitions: term frequency (tf) of token tt in a document: the number of times the token appears in a given document inverse document frequency (idf) of token tt: 1+log2(C1+nt)1+log2(1+ntC)where CC is the size of the corpus (i.e. the number of documents in C), ntnt is the total number of documents that contain the token tt and log2log2 is the logarithm to the base 2

Finally, tfidf = tf * idf (i.e. a product of tf and idf).

For the purposes of computation, the case of the token in the document should be ignored (e.g.The, THE and the should be treated as the same token).

[execution time limit] 4 seconds (py)
[input] array.string corpus

List of documents in the corpus
[input] integer doc_idx

index (0 based) of the document in the corpus
[input] string token

input token for computing tfidf
[output] float

tfidf value

[Python 2] Syntax Tips

# Prints help message to the console # Returns a string def helloWorld(name): print "This prints to the console when you Run Tests" return "Hello, " + name

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Need help getting started on these questions. I am supposed to add code where it says "implement me" and write the answer where it says answer in one or two line. Need to fill in the "Implement me"...

Need to fill in all parts that say "Implement me" and answer in one or two lines here. The following cell contains code that will be referred to as the Preprocessing Block from now on. It contains a...

This is a programming exercise. You will create functions that process a corpus of text files to collect statistics, build a simple index as postings, and support queries. For the main analysis you...

Instructions for submission One of the topics covered in Analysis of Algorithms are algorithms for traversing graphs. The structure of the world-wide-web is an example of a directed graph with each...

Problem 3 How does computing the amount paid in commissions to Metaphor agents in Problem 2 help an auditor verify the management assertion of completeness? Chapter 3 ACL Exercises and Problems...

Programming Assignment 4 Little Search Engine In this assignment you will implement a simple search engine for text documents using hash tables. Worth 75 points = 7.5% of your course grade Posted...

(JAVA) Write code into the main function of class Analysis. You are free to modify Business.java if you use it and finally everything must be put in a package called Data. Starter Code: Business.java...

CASE STUDY QUESTIONS. By the use of Michael Porters generic strategies, Justify how ZAPOS.com would have performed better? (5 MARKS) 1) Assuming that you are the new Communications director for the...

Devaluation is often used by countries to improve their current accounts. Since the current account equals national saving less domestic investment, however (see Chapter 13), this improvement can...

The adjusted trial balance of Ryan Corporation includes the following overhead costs that are to be distributed before the books are closed to its three cost centers: A, B, and C. Data used for cost...

Liquidity risk for an FI includes the possibility of an unexpected inflow of funds.

Share the steps pls Sheridan Company's sales budget projects unit sales of part 198Z of 9,700 units in January, 11,500 units in February, and 12,700 units in March. Each unit of part 198Z requires...

1. Identify the sources for this conflict.

3. How would you address the problems that make up the situation?

2. What recommendations will you make to the city council?