Question: please solve using file path only ...others i can do as file is not accessible due to tech issue in server Complete the python program
please solve using file path only ...others i can do as file is not accessible due to tech issue in server
Complete the python program file assignment2.py using spark to do the following:
- Read the data (all the files in the data directory) using the function textFile
- Take only the text part of each article and count the frequency of all the words (convert the text into lowercase)
- Remove (Filter) any word whose frequency is less than 10
- Report the following:
- Total size of the output data (after the filtering) [1 point]
- Frequency of the following words congress, london, washington, football [1 point]
- The word with maximum frequency for each month (hint: to read only a months articles you can use *. E.g. for February 2012-02* represents all files starting with 2012-02,i.e. files belonging to Feb) [3 points]
- List of words appeared on 2012-09-01 but not on 2012-08-01 [3 points]
- Frequency of the word monsoon for all months [2 points]
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
