Question: Description: Given a time series data which is a clickstream of user activity is stored in any flat flies, ask is to enrich the data
Description: Given a time series data which is a clickstream of user activity is stored in any flat flies, ask is to enrich the data with session id.
Session Definition: Session expires after inactivity of 30 mins, because of inactivity no clickstream record will be generated.
Session remains active for a total duration of 2 hours Steps:
Load Data in any flat file format.
Read the data and use spark batch (pyspark/scala) to do the computation.
Save the results in parquet with enriched data.
Note: Please do not use direct spark-sql.
Given Dataset: timestamp userid
2018-01-01T11:00:00Z u1
2018-01-01T12:00:00Z u1
2018-01-01T11:00:00Z u2
2018-01-02T11:00:00Z u2
2018-01-01T12:15:00Z u1
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
