Question: Please write the program in Java. For the result, please separate into three different parts/ways like the instruction in section 4 below (PART a, PART

Please write the program in Java. For the result, please separate into three different parts/ways like the instruction in section 4 below (PART a, PART b, PART c).

The answer must be in three different JAVA files:

PART a: Index.java

PART b: IndexRunner.java

PART c: GlobalRunner.java

In this project, you will be working with input, output and threads. The purpose of this project is understand how multithreading works.

You are expected to investigate the Java library and use the classes and methods in the Collections library as much as possible. You will also need to look into the File class to see how to use folders and files. I would encourage you to use the Scanner class to read in data from the files as it is very easy to use. However, a BufferedReader and a StringTokenizer will provide you with better runtimes for very large files.

In short, you are creating a word index in a few different ways. An index helps you find information in your files faster. An index can also be very useful when comparing how similar two documents are to one another. One way of determining how similar two documents are is to compare the number of uncommon words they share. This might be useful in a recommender type system. For example, if I really like a book that often contains the wordsCalifornia, surfing, sunshineandbeach, then odds are good I will like another book that contains a similar number of those keywords.

You must implement the following steps:

1. You will write program that goes through a text file and creates a word index of every word in the file. The index will be the "page" that a particular word is found on. Since a text file only contains text and does not contain any metadata, "pages" will be created depending on the number of characters read in so far (not including delimiters). The number of characters that define a page will be specified at runtime.

2. The user will specify 3 command-line arguments. The first argument is the folder where all of the text files are saved, the second argument is the output folder that the output file(s) will be stored to and the third argument is the number of characters that represent a page.

Assuming the java file is called Index.java, the following command:

java Index myFolder outputFolder 100

would indicate that all input files are stored in a folder called myFolder that is in the same folder as your Java class file. All of the input text files are in myFolder. All text files that you need to account for will have an extension of.txt. Only *.txt files will be in the input folder. The second argument is what folder you will store your output files to. You can assume this empty output folder has already been created. In the command, outputFolder will not have a slash attached to it. In other words, your code must insert the slash to make it work to put the files in the correct folder.

The third argument is the number of characters on a page. Assume that number is K. The number of characters on each page goes up to but doesn't exceed K actual characters. For instance, if K is 100 and you have read in 98 actual characters so far and the next word isbadger, the word badger would be the first word on the next page (do not split up the word and put part of it on one page and the other part on another page). To make this a bit easier, you must ignore delimiter characters with respect to the number of characters on a page. You can assume that no 1 word will be longer than the number of characters on a page. In this example, no word will be more than 100 characters.

3. Create a word index of each file and store that index into an output file. If your input file is calleda.txt, then your output file must be calleda_output.txt and your output will be stored in the output folder that was specified. You will create an output file for each input file. Your word index should be created in the following way:

  1. Read in all words. A word is any consecutive sequence of letters, numbers, apostrophes, special symbols, etc. The only things that delimit words are a space, tab and newline (i.e. whitespace). In my sample files, I am using the default delimiters specified by the Scanner class. If you use something different, you may get different results. You should store the words into one of the following Collections: TreeSet, HashSet, TreeMap or HashMap. If you don't, your code will likely run very slow. All words are case insensitive. For example, the wordscat,Cat,CATandcATare all considered the same word. Punctuation and other symbols are not to be filtered out. Therefore,cat:andcatare two separate words (note the colon after the first cat).
  2. Along with reading in all of the words, remember which "page" the word was on. We will start counting from page 1.
  3. For each file, after you have read in all of the words, you should write out your word index to that file's output file. You should write out each word that appears in the file and for each word that you write out, you must also write what page(s) that word appears on. You must write out the words in alphabetical order and only put 1 word per line. Assume the word cat appears on pages 4, 10 and 16, your output should be formatted in the following fashion: cat 4, 10, 16

In other words, it is the word, followed by a space, followed by the page(s) that word appeared on where each page is separated by a comma (see sample input/output). Page numbers must appear in ascending order. Words that appear multiple times on the same page should not show up multiple times in the final output. For example, if the wordcat appears on page 4 a total of 3 times, page 4 should only show up once in the output.

4. You must solve this project in 3 different ways:

Part a. (22 pts) The first way iswithout using threads. You must call your file that does not use threadsIndex.java. You must time your code and determine how long it took to solve without using threads. Your program will create the appropriate files and then print out 1 thing to the terminal window: the amount of time it took to execute in milliseconds.

Partb. (44 pts) Once you have your solution to 4a, modify it so that it uses threads in some fashion. The most natural way to use threads is to create a new thread for each file you read in. If you are testing this on a machine with multiple cores, you should notice a significant decrease in time (assuming you are using large enough files). You must call your file that uses threadsIndexRunner.java. Note that this is not saying you are only allowed to create 1 file. Your main method must be in IndexRunner.java but you can create as many files as you want. Your program will create the appropriate files and then print out 1 thing to the terminal window: the amount of time it took to execute in milliseconds. Be sure to wait for your threads to finish before reporting a time.

Partc. (44 pts) In short, you are creating a global word index. This differs from the previous question in that you need tosend your results back to the master thread so that the master thread can produce the output.Only the master thread is allowed to produce any output for this part. You must name this master fileGlobalRunner.java. You can create more than 1 Java file. Your main method must be in GlobalRunner.java but you can create as many Java files as you want. Create a word index for each file and store each index into a single output file calledoutput.txtthat is stored in the specified output folder. The output will be a comma separated file. The first line of your output file must be this heading: Word, first.txt, second.txt, third.txt, xfourth.txt, etc. In other words, the first word should be Word. This is followed by the names of the input files in alphabetical order.

Your outputs will be combined. The words will be case insensitive and will appear in alphabetical order. For example, assume the wordcat is in a.txt on pages 2, 4, 6, is in b.txt on 3, 5, 7 and is in c.txt on 2, 5, 7. Your output line for cat would be: cat, 2:4:6, 3:5:7, 2:5:7 Since commas are used to delimit the files, colons will be used to delimit page numbers. If the worddog is in a.txt on pages 2 and 5, is not in b.txt but is in c.txt on page 8, then your output line for dog would be: dog, 2:5, , 8

You are welcome to use whatever callback or synchronization solution you want but you have to make sure that only the master thread creates the output. No worker thread is allowed to print anything. Your program will create the appropriate output file and then print out 1 thing to the terminal window: the amount of time it took to execute in milliseconds.

Sample Input/Output (Java)

You can find sample input and output files at (see README files for how to run them):

For problems 4a and 4b:

http://faculty.cs.uwosh.edu/faculty/krohn/ds730/JavaProj.zip

For problem 4c:

http://faculty.cs.uwosh.edu/faculty/krohn/ds730/MoreJavaProj.zip

Producing identical output to the posted output does not necessarily mean everything is correct. For example, if you ignore the multithreading requirements of 4b or ignore the printing requirements of 4c, your program may produce the correct output but still be incorrect.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!