Question: Functions to be used: PySpark SQL Aggregate Functions ( collect_set() , avg(), countDistinct(), count(), first(), last() ) Write a program create your own data file
Functions to be used: PySpark SQL Aggregate Functions ( collect_set() , avg(), countDistinct(), count(), first(), last() )
Write a program
create your own data file as a cvs file. Use this file in your code.
create the schema.
Use 6 DataFrame functions above.
Display your output for each use of a function.
You must write comments of what you are doing among the statements.
Place your comments in a print statement so that it is seen on the output as well as in the source code. Like Print(# This is a comment)
Use Pyspark and Pycharm.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
