Question: Check the schemas: [ ] gameclicks.printSchema ( ) root T- timestamp: string (nullable true) clickid: string (nullable true) userId: string (nullable = true) userSessionId: string

 Check the schemas: [ ] gameclicks.printSchema ( ) root T- timestamp:string (nullable true) clickid: string (nullable true) userId: string (nullable = true)userSessionId: string (nullable = true) ishit: string (nullable true) teamId: string (nullable= true) |-- teamLevel: string (nullable = true) adclicks.printSchema () = E

Check the schemas: [ ] gameclicks.printSchema ( ) root T- timestamp: string (nullable true) clickid: string (nullable true) userId: string (nullable = true) userSessionId: string (nullable = true) ishit: string (nullable true) teamId: string (nullable = true) |-- teamLevel: string (nullable = true) adclicks.printSchema () = E root T- timestamp: string (nullable true) txId: string (nullable true) userSessionId: string (nullable true) teamId: string (nullable = true) userId: string (nullable = true) |-- adid: string (nullable = true) | -- adCategory: string (nullable = true) Question 1: How many users in each team? Keywords: Dataframe API, SQL, group by, sort Use DataFrame API to group the users by teamID and count how many distinct users in each team. Sort the result in descending order. Indented block [ ] team_counts = # your code goes here (gla: 4 points) team_counts.show(). Now rewrite the above question using pure SQL: [ ] gameclicks.registerTemptable("gameclicks") query = # your code goes here (Q1b: 2 points) team_counts = spark.sql(query) team_counts.show() Questions 2: Now use the ad-clicks dataset to find the number of ad clicks in each hour. Keywords: group by, parse timestamp, plot timestamp_only adclicks.selectExpr(["to_timestamp (timestamp) as timestamp"]) click_count_by_hour = # your code goes here (Q2a: 4 points) click_count_by_hour.show(24) Check the schemas: [ ] gameclicks.printSchema ( ) root T- timestamp: string (nullable true) clickid: string (nullable true) userId: string (nullable = true) userSessionId: string (nullable = true) ishit: string (nullable true) teamId: string (nullable = true) |-- teamLevel: string (nullable = true) adclicks.printSchema () = E root T- timestamp: string (nullable true) txId: string (nullable true) userSessionId: string (nullable true) teamId: string (nullable = true) userId: string (nullable = true) |-- adid: string (nullable = true) | -- adCategory: string (nullable = true) Question 1: How many users in each team? Keywords: Dataframe API, SQL, group by, sort Use DataFrame API to group the users by teamID and count how many distinct users in each team. Sort the result in descending order. Indented block [ ] team_counts = # your code goes here (gla: 4 points) team_counts.show(). Now rewrite the above question using pure SQL: [ ] gameclicks.registerTemptable("gameclicks") query = # your code goes here (Q1b: 2 points) team_counts = spark.sql(query) team_counts.show() Questions 2: Now use the ad-clicks dataset to find the number of ad clicks in each hour. Keywords: group by, parse timestamp, plot timestamp_only adclicks.selectExpr(["to_timestamp (timestamp) as timestamp"]) click_count_by_hour = # your code goes here (Q2a: 4 points) click_count_by_hour.show(24)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!