Question: Using the data from Homework 1 1 . Read the data ( all the files in the data directory ) into an RDD using the

Using the data from Homework 1
1. Read the data (all the files in the data directory) into an RDD using the function textFile
2. Take only the text part of each file and count the frequency of all the words (convert the text into lowercase)
3. Remove (Filter) any word whose frequency is less than 3
4. Report the following
1. The total size(the word count) of the output data(After filtering)
2. the five most frequent words in all files.
3. The word with maximum frequency for each file (Individually)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!