Question: pagerank_data.txt 1 2 1 3 2 3 3 4 4 1 2 1 Rewrite the PageRank example using DataFrame API. Here is a skeleton of
pagerank_data.txt
1 2
1 3
2 3
3 4
4 1
2 1
Rewrite the PageRank example using DataFrame API. Here is a skeleton of the code. Your job is to fill in the missing part:
from pyspark.sql.functions import *
numOfIterations = 10
lines = spark.read.text("pagerank_data.txt")
a = lines.select(split(lines[0],' '))
links = a.select(a[0][0].alias('src'), a[0][1].alias('dst'))
outdegrees = links.groupBy('src').count()
ranks = outdegrees.select('src', lit(1).alias('rank'))
for iteration in range(numOfIterations):
# FILL IN THIS PART
ranks.orderBy(desc('rank')).show()
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
