Question: Figure 1 : MapReduce based interative programmingThis challenge is caused by the fact that Hadoop is design to utilize the storage space in thecluster. However,

Figure 1: MapReduce based interative programmingThis challenge is caused by the fact that Hadoop is design to utilize the storage space in thecluster. However, each MapReduce program requires to output the data into the hard drive. Thefeature leads to a large amount of read/write of HDFS, which significantly limits the perfor-mance.Spark ProgrammingThe spark system implements the Resilient Distributed Dataset (RDD) to maximize the memoryspace in the cluster. With RDD, most of the operation is done in the memory (To develop a K-Means algorithm in spark, you just need to transform the previous RDD into anew one for the next interation.Programming in Lab 2In this lab, please, based on your previous code, inplement the K-Means algorithm, you can useany spark related library package.1. Part 1: Please redo Project 1 Part 1 Question 1 with different levels of parallelism, 2,3,4,5. You can change parallelism level by adding one line in the test.sh,conf spark.default.parallelism=2,after spark-submit to set parallelism level to 2.2. Part 2: Please redo Project 1 Part 2 Question 2.3. Part 3: Please redo Project 1 Bonus Question (K-Means in Spark).Installing the spark cluster GitHub Link.Grading RubricUp to 2 students in a group.(50%) Part 1;(20%*2) Part 2 and 3.(10%) Report;
Figure 2: Hadoop v.s. Spark
To develop a K-Means algorithm in spark, you just need to transform the previous RDD into a
new one for the next interation.
Programming in Lab 2
In this lab, please, based on your previous code, inplement the K-Means algorithm, you can use
any spark related library package.
Part 1: Please redo Project 1 Part 1 Question 1 with different levels of parallelism, 2,3,4,
You can change parallelism level by adding one line in the
test.sh,- conf spark.default.parallelism=2,
after spark-submit to set parallelism level to 2.
Part 2: Please redo Project 1 Part 2 Question 2.
Part 3: Please redo Project 1 Bonus Question (K-Means in Spark).
Installing the spark cluster GitHub Link.
Grading Rubric
Up to 2 students in a group.
(50%) Part 1;
(20%**2) Part 2 and 3.
(10%) Report;
Figure 1 : MapReduce based interative

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!