Question: Problem 1 . Rob is installing Spark in Ubuntu 16.04 OS. Please help him with the installation. Step 1 : Since Spark needs Scala, Rob

Problem 1. Rob is installing Spark in Ubuntu 16.04 OS. Please help him with the installation.

Step 1: Since Spark needs Scala, Rob needs to install Scala first. He downloads the Scala (scala-2.12.4.tgz). He unzips the file in the /home/rob/ directory and renames the folder as scala. Therefore /home/rob/scala is the root directory for Scala. After that, please tell Rob how to update the environment variables SCALA_HOME and PATH.

Answer: Open the .bashrc file by using the following command:

$

Add the following two lines to the end of the file to update the environment variables SCALA_HOME and PATH.

1:

2:

Step 2: He downloads the Spark (spark-2.2.1-bin-hadoop2.7.tgz). He unzips the file in the /home/rob/ directory and renames the folder as spark. Thus /home/rob/spark is the root directory for Spark. Now Rob needs to update the environment variables SPARK_HOME and PATH.

Answer: Open the .bashrc file again and add the following two lines to the end of the file.

1:

2:

Step 3: After Spark installation, we can use the following commands to verify the Spark installation. To start the Python Spark shell, we should type:

$

To start the Scala Spark shell, we should type:

$

Step 4: Rob wants to run the WordCount example in the batch mode. Suppose that the Python source code is in the file WordCount.py, please give the command for running this Python Spark source code file. Suppose the input file name and output file directory are hard coded in the source code, so you do not need to pass those parameters in the command line.

$

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!