Question: Complete the following using spark: 1 . Read the data ( all the files in the data directory ) using the function textFile 2 .
Complete the following using spark:
Read the data all the files in the data directory using the function textFile
Take only the text part of each article and count the frequency of all the words convert the text into lowercase point
Remove Filter any word whose frequency is less than points
Report the following:
The total size of the output data after the filtering points
The frequency of the following words congress, london, washington, football points
The word with maximum frequency for each month hint: to read only a months articles, you can use Eg for February represents all files starting with ie files belonging to Feb points
The list of words that appeared on but not on points
The frequency of the word monsoon for all months points
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
