Question: can please someone help me with this assignment python Part 3. PySpark Orientation (functional Programming Examples and Tasks) Sparks shell provides a simple way to

can please someone help me with this assignment python

Part 3. PySpark Orientation (functional Programming Examples and Tasks)

Sparks shell provides a simple way to learn the API, as well as a powerful tool to analyze data interactively. It is available in either Scala (which runs on the Java VM and is thus a good way to use existing Java libraries) or Python. Since this is Python course we will start the Python shell by running the following in the Spark directory:

3.1 Word searching

./bin/pyspark

>>>textFile = sc.textFile(README.md) # can be some file on your system

>>>linesWithSpark = textFile.filter(lambda line: "Spark" in line)

>>>textFile.filter(lambda line: "Spark" in line).count()

# How many lines contain Spark"?

Put your answer here

3.2 Word counting: Lets find the line with the most words:

>>> textFile.map(lambda line: len(line.split())).reduce(lambda a, b: a if (a > b) else b)

Put your answer here

3.3 Define your own max function (same as 3.2)

>>> def max(a, b):

... if a > b:

... return a

... else:

... return b

...

>>> textFile.map(lambda line: len(line.split())).reduce(max)

Put your answer here

3.4 Word count MapReduce Example One common data flow pattern is MapReduce. Here, we combined the flatMap, map and reduceByKey transformations to compute the per-word counts in the file as an RDD of (string, int) pairs. To collect the word counts in our shell, we can use the collect action:

>>>wordCounts = textFile.flatMap(lambda line: line.split()).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a+b)

>>>wordCounts.collect()

>>>wordCounts.count()

Put your answer here

3.5 Word count App

Using the following command to run a simple wordcount app in Spark

./bin/spark-submit examples/src/main/python/wordcount.py

Put your answer here

3.6 Word count program performance evauation.

We provide three different ways to do the frequency counting in Python. You need to compare the output and report if they return the same results. If not, try to explain why.

Method A, Using Python built-in collector

import re, collection.Counter

words = re.findall(r'\w+', open('hamlet.txt').read().lower())

Counter(words).most_common(10) # top-ten most common word

Method B Using Python dictionary

file=open("hamlet.txt","r+")

wordcount={}

for word in file.read().split():

if word not in wordcount:

wordcount[word] = 1

else:

wordcount[word] += 1

newdict = sorted(wordcount.items(), key=operator.itemgetter(1))

print [i for i in newdict[::-1][:10]] # reverse the order

Method C Using Spark

./bin/pyspark

from pyspark import SparkContext, SparkConf

count =

sc.textFile('hamlet.txt').flatMap(lambda line: line.split()).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b)

output = count.map(lambda (k,v): (v,k)).sortByKey(False).take(10)

print ["%s: %i" % (value, key) for key, value in output]

Put your answer here

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Training and Development 7 Blend Images/Blend Images/Superstock Learning Outcomes Define the terms training and development. After reading this chapter, you should be able to do the following:...

PLEASE READ CAREFULLY THE CASE STUDY PROVIDED AND FEEL FREE TO ADD HERE YOUR COMMENTS FOR EXAMPLE LIKES DISLIKES WORDS OR PHRASES YOU DO NOT UNDERSTAND ANY COMMENTS THAT WILL IMPROVE THE DIALOGUE...

This paper should include 3-5 pages of content with an additional cover and reference page. This is a total of 5-7 pages. Please be aware that a properly formatted page will include approximately 350...

TANGLEWOOD CASEBOOK for use with STAFFING ORGANIZATIONS 5th Ed. Kammeyer-Mueller 1 TANGLEWOOD CASEBOOK To accompany Staffing Organizations, fifth edition, 2006. Prepared by John Kammeyer-Mueller...

ADMN 233 Assignment 4 Assignment 4 Instructions Assignment 4 is worth 20% of your final mark. It should be completed and submitted after you finish Chapter 13 in your textbook. This assignment is...

Introduction to Ridgeline Mountain Outfitters (RMO) Ridgeline Mountain Outfitters (RMO) is a large retail company that specializes in clothing and related accessories for all types of outdoor and...

i want complete solution for my assignment and it should be without plagiarism COIT20274: Information Systems for Business Professionals, Term One 2016 Assignments 1 & 2 Requirements Assignment 1 -...

Write two paragraphs for chapter 16 (at least five grammatically correct complete sentences for each chapter). One summarizing the chapter and one what you learned in each chapter. It should be your...

An annuity due with 10 annual payments has a present value of $1,244,353. If the interest rate is 15%, what is the annual payment?

SMC Company purchases a building for $100,000. Included in this cost are $12,000 for electrical systems and $15,000 for the roof. The building is expected to have a 40-year useful life, but the...

b ) When estimating the cost of capital for a project if the comparable firm is a pure ployer with a single fine of business, the comparable firm's equity bete does not heve to be edfusted. Answer:...

Seved Help 14 Wisconsin Snowmobile Corp. is considering a switch to level production Cost efficiencies would occur under level production, and aftertax costs would decline by $31,500, but inventory...

3. What value would it be to know that you were going to be training a class of persons between the ages of 20 and 35? Would it influence the approach you would take? How?

5. How do instructional objectives help learning to occur?

4. Help trainees set challenging mastery or learning goals.