Question: Description: Given a time series data which is a clickstream of user activity is stored in any flat flies, ask is to enrich the data

Description: Given a time series data which is a clickstream of user activity is stored in any flat flies, ask is to enrich the data with session id.

Session Definition: Session expires after inactivity of 30 mins, because of inactivity no clickstream record will be generated.

Session remains active for a total duration of 2 hours Steps:

Load Data in any flat file format.

Read the data and use spark batch (pyspark/scala) to do the computation.

Save the results in parquet with enriched data.

Note: Please do not use direct spark-sql.

Given Dataset: timestamp userid

2018-01-01T11:00:00Z u1

2018-01-01T12:00:00Z u1

2018-01-01T11:00:00Z u2

2018-01-02T11:00:00Z u2

2018-01-01T12:15:00Z u1

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!