Question: Modify the word count query so that the streaming query only returns results where the word count is greater than two. from pyspark.sql import SparkSession

Modify the word count query so that the streaming query only returns results where the word count is greater than two.

from pyspark.sql import SparkSession
from pyspark.sql.functions import explode
from pyspark.sql.functions import split


spark = SparkSession
    .builder
    .appName("Assignment 7.1")
    .getOrCreate()

lines = spark
    .readStream
    .format("socket")
    .option("host", "localhost")
    .option("port", 9999)
    .load()

# Split the lines into words
words = lines.select(
   explode(
       split(lines.value, " ")
   ).alias("word")
)

# Generate running word count
wordCounts = words.groupBy("word").count()

try:
    query = wordCounts
        .writeStream
        .outputMode("complete")
        .format("console")
        .start()

    query.awaitTermination()
except KeyboardInterrupt:
    print('Stopping query')

Step by Step Solution

3.45 Rating (158 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock

from pysparksql import SparkSession from pysparksqlfunctions ... View full answer

blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!