Question: Hadoop/PySpark: Write a PySpark program to: 1. Iterate through a folder of files in a hadoop fs directory. 2. Open each file 3. calculate the
Hadoop/PySpark:
Write a PySpark program to:
1. Iterate through a folder of files in a hadoop fs directory.
2. Open each file
3. calculate the variance of the data in the file
4. write results (Filename, variance) to a new file.
5. print the average variance.
The file is ascii text in the following format
123.0
562.0
792.9
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
