Question: /** * Inserts a document into the search engine for later analysis and retrieval. * * The document is uniquely identified by a documentId; attempts

/**

* Inserts a document into the search engine for later analysis and retrieval.

* The document is uniquely identified by a documentId; attempts to re-insert the same

* document are ignored.

* The document is supplied as a Reader; this method stores the document contents for

* later analysis and retrieval.

* @param documentId

* @param reader

* @throws IOException iff the reader throws an exception

public void addDocument(DocumentId documentId, Reader reader) throws IOException {

String s = "";

BufferedReader br = new BufferedReader(reader);

while((s = br.readLine()) != null) {

list = Arrays.asList(s.toLowerCase().split("\\W+"));

for(int i = 0; i < list.size(); i++)

{

if(!map.containsKey(list.get(i)))

{

Set newset = new HashSet<>();

newset.add(documentId);

map.put(list.get(i), newset);

}

else

{

Set set = map.get(list.get(i));

set.add(documentId);

map.put(list.get(i), set);

}

/**

* Returns the set of DocumentIds contained within the search engine that contain a given term.

* @param term

* @return the set of DocumentIds that contain a given term

public Set indexLookup(String term) {

Set t = new HashSet();

String k = term.toLowerCase();

for(String doc: google.keySet()){

if(doc.contains(k)){

t.add(google.get(doc));

}

return t;

}

/**

* Returns the term frequency of a term in a particular document.

* The term frequency is number of times the term appears in a document.

* See

* @param documentId

* @param term

* @return the term frequency of a term in a particular document

* @throws IllegalArgumentException if the documentId has not been added to the engine

public int termFrequency(DocumentId documentId, String term) throws IllegalArgumentException {

}

/**

* Returns the inverse document frequency of a term across all documents in the index.

* For our purposes, IDF is defined as log ((1 + N) / (1 + M)) where

* N is the number of documents in total, and M

* is the number of documents where the term appears.

* @param term

* @return the inverse document frequency of term

public double inverseDocumentFrequency(String term) {

}

/**

* Returns a sorted list of documents, most relevant to least relevant, for the given term.

* A document with a larger tfidf score is more relevant than a document with a lower tfidf score.

* Each document in the returned list must contain the term.

* @param term

* @return a list of documents sorted in descending order by tfidf

public List relevanceLookup(String term) {

}

I have this code, and I want to do what the comments above each methods says. I have this declared and imported:

import comparators.TfIdfComparator; import documents.DocumentId;

public Map> map = new HashMap<>();

public List list;

Please help me in the rest of the methods with accordance to the map variable. Also check if iI have the the first (addDocument and indexLookup) implemented correctly.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

package index; import java.io.IOException; import java.io.Reader; import java.util.List; import java.util.Set; import comparators.TfIdfComparator; import documents.DocumentId; /** * A simplified...

package index; import java.io.IOException; import java.io.Reader; import java.util.ArrayList; import java.util.HashMap; import java.util.HashSet; import java.util.List; import java.util.Map; import...

package index; import java.io.IOException; import java.io.Reader; import java.util.List; import java.util.Set; import comparators.TfIdfComparator; import documents.DocumentId; /** * A simplified...

import documents.DocumentId; import index.SearchEngine; import java.util.Comparator; /** * Compare two documents in a search engine by tf-idf using a given term. * * Using this comparator, the...

Case 8: Google and the Right to Be Forgotten (Privacy) Synopsis of the Case: The content of the synopsis should present relevant background facts about the case under examination. Relevant Factual...

Concordia University Libraries CITATION GUIDES APA Citation Style This guide provides a basic introduction to the APA citation style. It is based on the 5th edition of the Publication Manual of the...

The customers and the potential facilities of a country have been aggregated in seven regions as shown in Figure 1. Figure 1: A B D E F G A 0 7 4 5 12 7 14 B 7 0 11 12 12 14 11 A A company would...

Continuous compound interest can be calculated using the formula A(t) = Pe rt , where P is the initial amount and A(t) is the value after time t at interest rate r (as a decimal). (a) When Angela was...

Question 9 2 pts Expansion of the New Jersey Medical Cannabis Act allows employers to violate Federal law in its protection of employees and their right to use after registering to use medical...

Seved Help 14 Wisconsin Snowmobile Corp. is considering a switch to level production Cost efficiencies would occur under level production, and aftertax costs would decline by $31,500, but inventory...

3. Is there opportunity to improve current circumstances? How so?

1. Divide the class into small work groups to write a policy on cell phone usage while attending staff meetings.

2. What do you believe is at the root of the problem?