Question: Write BFS search function in Apache Spark and pandas dataframe: # This is a pairs notation of the edges, for simplicity of visualization graph =

Write BFS search function in Apache Spark and pandas dataframe:

# This is a pairs notation of the edges, for simplicity of visualization graph

= [('

','

'), ('

','

'), ('

','

'), ('

','

'), ('

','

'), ('

','

'), ('

','

'), ('

','

')]

# Here's an equivalent dictionary representation that we can use for a # Pandas DataFrame... simple

_

dict

= {'

from

_

node':

['

','

','

','

','

','

','

','

'],'

_

node':

['

','

','

','

','

','

','

','

']}

simple

_

graph

_

=

.

DataFrame.from

_

dict

(

simple

_

dict

)

simple

_

graph

_

sdf

=

spark.createDataFrame

(

simple

_

graph

_

)

simple

_

graph

_

sdf

.

show

()

As you can see, each row of this dataframe represents an edge between two nodes Although the nodes are labeled "from" and

"

",

the edges are actually undirected, meaning that A

- - >

B represents the same edge as B

- - >

.

Let's define our starting node as follows:

smallOrig

= [{'

node

'

'

'}]

TODO: Write spark

_

bfs

_1_

round

(

visted

_

nodes

)

that takes the currently dataframe of visited

_

nodes, performs one round of BFS

,

and returns an updated visited nodes dataframe. You should assume that a temporary sdf G already exists.

def spark

_

bfs

_1_

round

(

visited

_

nodes

)

" " "

:param visited

_

nodes: dataframe with columns node and distance :return: dataframe of updated visuted nodes, with columns node and distance

" " "

#TODO

Now, run the inner function on simple

_1_

round

_

bfs

_

sdf result of

1

round of BFS on simple graph and store the results in simple

_

bfs

_

result. This is ultimately what the output of BFS to depth

2

should look like.

simple

_

graph

_

sdf

.

createOrReplaceTempView

('

')

simple

_

bfs

_

result

=

#TODO simple

_

bfs

_

result.show

()

Convert this result to Pandas, sorted by the node.

simple

_

bfs

_

test

=

#TODO

Now, we will fully implement spark

_

bfs

.

This function should iteratively call your implemented version of spark

_

bfs

_1_

round and ultimately return the output of this function at max

_

depth.

You are also responsible for initializing the starting dataframe, that is converting the list of origin nodes into a spark dataframe with the nodes logged at distance

0 .

Consider the following:

schema

=

StructType

([

StructField

("

node

",

StringType

(),

True

)])

_

sdf

=

spark.read.format

("

csv

") .

schema

(

schema

) .

load

("

.

csv

")

The schema ultimately specifies the structure of the Spark DataFrame with a string node column. It then calls spark.load to read the CSV with this schema. Also, you are responsible for ensuring that a view of your graph is available within this function.

(

Note: you will also need to add in a distance column

)

TODO: implement spark

_

bfs

(

,

origins,max

_

depth

)

and run on review

_

graph

_

sdf initalized in

4.3 .

Note: you may want to run tests on the simple

_

graphexample as the review

_

graph

_

sdf will take quite some time to run.

# TODO: iterative search over undirected graph # Worth

5

points directly, but will be needed later def spark

_

bfs

(

,

origins, max

_

depth

)

" " "

runs distributed BFS to a specified max depth :param G: graph dataframe from

4.3

:param origins: list of origin nodes stored as

{"

node

"

: nodeValue

}

:param max

_

depth: integer value of max depth to run BFS to :return: dataframe with columns node, distance of all visited nodes

" " "

#TODO # Remember that if you want to go from Pandas dataframes to Spark dataframes, # you may need to write to a CSV and read it back.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Sudoku Puzzle using Breadth-First Search (BFS) and Depth-First Search (DFS) Strategies (Java) write source codes for the BFS and DFS algorithms to solve the Sudoku Puzzle The program should solve the...

(Java) write source codes for the BFS and DFS algorithms to solve the Sudoku Puzzle The program should solve the Sudoku puzzle for two board configurations - 6x6 and 9x9 boards fill in the TO DO BFS...

Can someone help me with this Mips assembly program? I tried to post this a few times and all i got was errors or no one answered.. Please help. I mainly neeed help with Problem 2. Problem 1. Write a...

Write a modularized, menu-driven program to read a file with an unknown number of inventory records using an array of structs/objects and an array of pointers. in C++ An item may have three statuses:...

%%%%% Python Jupiter Notebook %%% research-papers.csv (Please check the google doc for data) https://docs.google.com/document/d/19C8hrhFjNR9wDXhRGjVdf1yKdxJfArZBjWV4O7DgocI/edit?usp=sharing Topics :...

Please write a Python program that will accept a list of integers (up to 20) from the user into a list , and then show those numbers in ascending order (after being sorted by your sort function), the...

Please write a C++ program that will accept a list of integers (up to 20) from the user into a list , and then show those numbers in ascending order (after being sorted by your sort function), the...

## Question 4 - Write a program that can search the papers dataset based on terms in the title (9 points) Make a dictionary that maps each term in the paper titles to a list of papers whose titles...

python notebook Question 4 - Write a program that can search the papers dataset based on terms in the title (9 points) Make a dictionary that maps each term in the paper tities to a list of papers...

Perform the same computation as in Sec. 24.2, but use O(h8) Romberg integration to evaluate the integral.

Read the latest statement from the Federal Open Market Committee which is on the Fed's website. What are they saying about current economic conditions and their response to them? Is their decision...

When there are an odd number of values in a distribution, the Median is the centrally located value when all the values are in ascending order. True False

3 7 6 5 44 3 2 1 7 6 5 4 3 98 2 1 2 3 6 7 8 9 y 1 2 3 4 5 6 7 X 8 Position the four points so they do not coincide and so they all lie on the graph of the same logarithmic function