Question: Hello, please modify this code: import pyspark sc = pyspark.SparkContext ( ) def NASDAQ ( line ) : try: fields = line.split ( ' ,

Hello, please modify this code:

import pyspark

=

pyspark.SparkContext

()

def NASDAQ

(

line

)

try:

fields

=

line.split

(',')

if len

(

fields

)! = 9

return False

#int

(

fields

[2] [

4])

return True

except:

return False

def COMPANYLIST

(

line

)

try:

fields

=

line.split

(' \

')

if len

(

fields

)! = 5

("

IPOyear

"

in line and "Symbol" in line

)

return False

return True

except:

return False

#Load files and clean

nasdaq

=

.

textFile

(" /

home

/

kinivera

/

BigData

/

Partial

2 /

input

/

NASDAQsample

.

csv

")

companylist

=

.

textFile

(" /

home

/

kinivera

/

BigData

/

Partial

2 /

input

/

companylist

.

tsv

")

nasdaq

=

nasdaq.filter

(

NASDAQ

)

companylist

=

companylist.filter

(

COMPANYLIST

)

nasdaq

=

nasdaq.map

(

lambda l:

(

.

split

(',') [1], (

.

split

(',') [2] [

4],

int

(

.

split

(',') [7]))))

#symbol,

(

date

,

)

companylist

=

companylist.map

(

lambda l:

(

.

split

(" \

") [0],

.

split

(" \

") [3]))

#symbol,sector

joined

_

rdd

=

nasdaq.join

(

companylist

)

#symbol,

((

date

,

),

sector

)

#print

(

joined

_

rdd

.

take

(10))

features

=

joined

_

rdd

.

map

(

lambda row:

((

row

[1] [1],

row

[1] [0] [0]),

row

[1] [0] [1]))

(

sector

,

date

),

# Reduce by key

(

Year

,

Sector

)

by adding the number of operations

sector

_

counts

=

features.reduceByKey

(

lambda x

,

y: x

+

)

[((

sector

,

year

),

), . . . .]

#print

(

sector

_

counts.take

(10))

# Find the sector with the highest number of operations for each year

max

_

sector

_

per

_

year

=

sector

_

counts.map

(

lambda x:

(

[0] [1], (

[1],

[0] [0])))

[(

year

, (

,

sector

)), . . . .]

result

=

max

_

sector

_

per

_

year.reduceByKey

(

lambda x

,

y: x if x

[0] >

[0]

else y

)

[(

year

, (

major n

,

sector

))]

#print

(

result

.

take

(10))

#order x year

result

=

result.sortByKey

()

# Convert the RDD to a format suitable for saving as text

max

_

sector

_

per

_

year

_

formatted

=

result.map

(

lambda x:

(

[1] [1], " {}, {} " .

format

(

[0],

[1] [0])))

# Save the RDD as a text file

max

_

sector

_

per

_

year

_

formatted

=

max

_

sector

_

per

_

year

_

formatted.coalesce

(1)

max

_

sector

_

per

_

year

_

formatted.saveAsTextFile

(" 1_

out"

)

This was the statement given for that exercise: RDD manipulation using transformation and action operations and performance optimization using RDD are evaluated. The execution time is also evaluated

This point takes into account the Nasdaq and companylist datasets. Remember the data format is: For NASDAQ: exchange, stock symbol, date, stock opening price, stock high price, stock low price, stock closing price, stock volume

and adjusted closing price of the stock. For companylist: Symbol, Name, initial public offering year IPOyear and industry sector.

1 .

Calculate, for each year of the DataSet given for point

1,

which sector had the greatest number of operations. The output must mention the year, the name of the sector and the overall value of operations. The result should look like:

Finance,

1996, 20090342

Pharma,

1996, 12312312

Finance,

1997, 25612312

Deliverable

1

: spark script where RDD is used to solve the problem, with the

name

1_

topsectorperyear.py

.

the lines must be explained within the script

code fundamentals

Deliverable

2

: Output file with the results, with the name

1_

out.txt

Now we have to solve this statement of a Big Data exercise and data frames cannot be used.

2 .

Calculate, for each company and business sector, which company grew the most per year, also listing the percentage of growth. The results should be in a format similar to:

Finance,

1996,

ABCD,

46 %

Finance,

1997,

VFER,

64 %

Deliverable

3

: spark script where RDD is used to solve the problem, with the

name

2_

topcompanypersector.py

.

The fundamental lines of BigData

/

spark code must be explained within the script

Deliverable

4

: Output file with the results, with the name

2_

out.txt

This must be done in a Linux virtual machine. The data companylist.tsv has headers Name, IPOyear Sector, and industry. In IPOyear some are with n

/

a and others with dates of years. The data in NADASQsample.csv has no statements.

\

table

[[

Symbol

,

Name,IPOyear,Sector,industry

], [

FLWS

, 1 - 800

FLOWERS.COM, Inc.,

1999,

Consumer Services,Other Specialty Stores

], [

FCTY

, 1

st Century Bancshares,,n

/

,

Finance,Major Banks

], [

FCCY

, 1

st Constitution Bancorp,n

/

,

Finance,Savings Institutions

], [

SRCE

, 1

st Source Corporation,n

/

,

Finance,Major Banks

], [

FUBC

, 1

st United Bancorp, Inc.,

\frac{n}{a},

Finance,Major Banks

], [

VNET

, 21

Vianet Group, Inc.,

\frac{n}{a},

Technology,Computer Software:

], [

SSRX

, 3

SBio Inc.,

2007,

Consumer Durables,Major Pharmaceuticals

], [

JOBS

, 51

job, Inc.,

2004,

Technology,Diversified Commercial

], [

FGHT

, 8 8 l n c, \frac{n}{a},

Dublic lltiliti,

T

Hello, please modify this code: import pyspark sc

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Accounting Questions!

Data Structure In this question, you will be implementing a hash table with a set of functionalities specified below. As you know, we already covered this topic in class and even provided an...

Hello I need to write a program in python, which will read a data file and will output the minimum amount spend by a customer, The code I have been working on is this, but I am getting errors can you...

CS 112 Project 5 Dictionaries and File IO Due Date: Sunday, April 23rd, 11:59pm Last chance to use tokens! (P6 won't allow late submissions) The purpose of this assignment is to explore dictionaries...

JAVA QUESTION: please help! Hello, please MODIFY the following code so that the buttons are displayed like the picture and there is a black border surrounding it. Please DO NOT give me the solution...

Hello, This is for flutter development. I am learning dart language and can really use some help figuring out how to do the following: Modify the code to include a Decrement button and randomly...

Hi can you please provide the python codes for the following methods based on my code so that it can be directly added to my code without an error my code is shown after the tasks task 9 -...

Pleazze do it soon.... System.out.println(tPrint.numerator+T+tPrint.denominator): public static void negate(Rational test) { test.numerator = test.numerator -1; if (test.denominator = 0) {...

Compile and execute the existing Assembly code on the left side of the screen. The following is the code that should be displayed: (Complineonline, n.d.) section .text global _start ;must be declared...

part 2 please consider part 1 please make the changes in my code and please do not provide different codes and make sure that they pass the test cases and gives the expected output please refer to...

Two objects are attracting each other with a certain gravitational force. (a) If the distance between the objects is halved, the new gravitational force will (1) increase by a factor of 2, (2)...

The Global Oil Company The Global Oil Company is an internationalproducer, refiner, transporter and distributorof oil, gasoline and petrochemicals. Global Oil is a holding company with subsidiary...

Current Attempt in Progress When a company sells multiple products, the break - even point in sales dollars is computed by dividing the total fixed costs by the weighted - average contribution...

The breaking strengths of cables produced by a certain manufacturer have historically had a mean of 1925 pounds and a standard deviation of 55 pounds. The company believes that due to an improvement...