Question: Please do everything as said is simple but need help with it. Part 1: Spark Setup In this exercise you will setup a Ubuntu virtual

Please do everything as said is simple but need help with it.

Part 1: Spark Setup In this exercise you will setup a Ubuntu virtual machine and install Spark on it.

Download and install virtual box and ubuntu from the following sites as we did in the class.

https://www.virtualbox.org/wiki/Downloads https://www.ubuntu.com/download/desktop

Once the installation is complete you will need to install latest version of java. Issue the following commands

sudo apt-get update

sudo apt-get install default-jre

after installation is done check the version using the following command

java -version

You need to install scala https://downloads.lightbend.com/scala/2.12.3/scala-2.12.3.tgz . It will be downloaded into Downloads folder.

Decompress the tgz archive using the following command

tar -xvzf scala-2.12.3.tgz

file will be decompressed to scala-2.12.3 folder. Move this folder to /usr/local/scala folder using the following command.

sudo mv scala-2.12.3 /usr/local/scala

You need to set the PATH environment variable to the scala binary using the following command

export PATH=$PATH:/usr/local/scala/bin

test that installation is successful by checking the version

scala -version

Now install spark by downloading it from https://d3kbcqa49mib13.cloudfront.net/spark-2.2.0-bin- hadoop2.7.tgz

Decompress it using

tar -xvzf spark-2.2.0-bin-hadoop2.7.tgz

and move it to /usr/local/spark folder using the following command

sudo mv spark-2.2.0-bin-hadoop2.7 /usr/local/spark

Finally set the path variable

export PATH=$PATH:/usr/local/spark/bin

now issue the following command to check installation was successful.

spark-shell

It will take some time but you should see some messages and screen art saying spark version 2.2.0 and giving you prompt scala>

Part2: Using Spark to work with Dataset

For this exercise please read chapter2 of the text book and use the dataset available at

http://bit.ly/1Aoywaq.

Using the dataset complete the following tasks. 1. Please create a raw RDD for all the CSV files 2. Please remove all headers from the RDD 3. Please convert each record in the RDD to a case class record 4. Please sample 20 records from the RDD.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

I hope you can answer this question and find the reference below the question. Thank you Topic: Conducting personal job interviews using the STAR model 1- Design a two-hour training work plan for 10...

Topic: Conducting personal job interviews using the star model 1-Design a two-hour training work plan for 10 trainees 2-Determine the quality of trainees 3-Use the training design model Formulate one...

CHA P TER 9 Understanding Software: A Primer for Managers 1. INTRODUCTION L E A R N I N G O B J E C T I V E S 1. Recognize the importance of software and its implications for the rm and strategic...

Due Week 4: Work Breakdown Structure According to the PMBOK Guide, "the WBS is a deliverableoriented hierarchical decomposition of the work to be executed by the project team, to accomplish the...

You should construct a decision tree that includes one decision node (purchase a new EHR system) or maintain the status quo (retain the existing EHR system). There are the two EHR systems under...

Problem Solving and Decision-Making Introduction The Board of Directors of Bright Road Health Care System is considering which electronic health record (EHR) system to use and how to implement the...

Only at 7-Eleven - PMO Case Study By Tom Sheives, Unstuck Company, and Steve Barton, 7-Eleven, Inc. Abstract In 2004, 7-Eleven, Inc. created a Project Management Office (PMO) that continues today...

1 Learning Outcome: Project/Lab (2) By the end of this project, the students will: Learn how to use Mininet. Become familiar with client-server systems. Be able to develop a client-server IM system....

Need script to do sentimental analysis in python for amazon review data Here is the dataset text ## I purchased this monitor because of budgetary concerns . inexpensive[+1][a] ## This item was the...

i need help finding a recommendation on what the company (Paycom) should do in this situation. the major question i need answers to are 1.rethink a criteria for defining a high priority potential...

ABC Company and XYZ Company need to raise funds to pay for capital improvements at their manufacturing plants. ABC Company is a well-established firm with an excellent credit rating in the debt...

Financial profiles, expressing the dollar value of financial statement accounts as a percentage of total assets (for balance sheet accounts) and sales (for income statement accounts), are listed for...

Question 1 of 4 Which of the following business events would not be recorded in a company's accounting records? The company paid a monthly utility bill. The company issued 1 0 0 shares of common...

Seved Help 14 Wisconsin Snowmobile Corp. is considering a switch to level production Cost efficiencies would occur under level production, and aftertax costs would decline by $31,500, but inventory...

7. Use of new technology fits into the organizational culture or business strategy.

5. Employees have a difficult time attending scheduled training programs.

3. Trainees are comfortable using technology, including the Web, personal computers, and CD-ROMs.