Question: pagerank_data.txt 1 2 1 3 2 3 3 4 4 1 2 1 Rewrite the PageRank example using DataFrame API. Here is a skeleton of

pagerank_data.txt

1 2

1 3

2 3

3 4

4 1

2 1

Rewrite the PageRank example using DataFrame API. Here is a skeleton of the code. Your job is to fill in the missing part:

from pyspark.sql.functions import *

numOfIterations = 10

lines = spark.read.text("pagerank_data.txt")

a = lines.select(split(lines[0],' '))

links = a.select(a[0][0].alias('src'), a[0][1].alias('dst'))

outdegrees = links.groupBy('src').count()

ranks = outdegrees.select('src', lit(1).alias('rank'))

for iteration in range(numOfIterations):

# FILL IN THIS PART

ranks.orderBy(desc('rank')).show()

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!