Question: This exercises some basic Scala and some basic Spark Given the training set : X ( age: Double) ,Y( yearly visits: Double) X = (10

This exercises some basic Scala and some basic Spark

Given the training set : X ( age: Double) ,Y( yearly visits: Double)

X = (10 5 1 6 7 3 4 5 1 8)

Y = (2 4 4 2 4 5 4 5 6 4 )

1. Write your own file with this data, csv or text file

you can use Scala.io to read in the file in Scala and "spark.read. .... " using Spark

PART I : Do a Scala Analysis and compute --- regression coefficient, intercept, SST, SSR, SSE, Correlation coef, R, R^2, angle between x y . Draw the statistical triangle and identify the legs.

PART II Do a Spark analysis of this data set

0. Construct your schema to match the incoming data

1. Read in your file ( which will give you a DataFrame (DF).

2. Transform your DF to a DS ( show your DF and DS)

3. Construct a Vector assembler to convert the age column to a features vector

( can you use basic Spark datatypes to do this manually)?

4. Make sure the "visits" column is called "label" ( that is, get your DS ready for regression)!

5. Do a Spark regression and verify the Scala values earlier calculated.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!