Question: 4 Sort Words ( 2 0 % ) Implement a program in Java that receives as arguments an input directory and an output directory and

4 Sort Words (20%)
Implement a program in Java that receives as arguments an input directory and an output directory and that sorts the words read from each file by frequency in descending order and writes the sorted words and their frequencies in a corresponding file in the output directory.
The output files must follow the same folder structure as the input files. For example, if the program sorts the words found in the input file stored at CountedDataset1/folder6/document265.txt, it must store the sorted words in the file at SortedDataset1/folder6/document265.txt, where CountedDataset1 was the input directory and SortedDataset1 was the output directory. This program will use the output of the previous program as input.
When the program finished counting the words from an input file it needs to write in the corresponding output file on each line the word and the number of occurrences, separated by a space, similarly to the previous program.
For example, for the following input file:
filed 1
in 2
a2 different 1
way 1
The 3
year 1
of 4
release 1
date 1
is 4
longer 1
part 1
the 6
directory 1
path 3
based 1
number 1
which 1
identical 1
to 3
filename 3
The program needs to create the corresponding output file that contains:
the 6
of 4
is 4
The 3
path 3
to 3
filename 3
filed 1
in 2
a2 different 1 way 1
year 1 release 1 date 1 longer 1 part 1 directory 1 based 1 number 1 which 1 identical 1
Evaluate your program on the 5 datasets and measure (inside the program) the number of words read from the input and the amount of (wall) time it took to sort the words in all files. Plot a diagram showing how the total number of words from the datasets influences the throughput of your program, measured in words/second (total number of words in the datasets divided by total amount of time to sort the dataset). Answer the following questions:
What data structure(s) did you use to implement the program and why?
What algorithm did you use to sort the data and why?
Is your program compute-intensive, memory-intensive or IO-intensive and why?
Why would the total number of words in a dataset influence the performance of your program on the virtual machine?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!