Question: Python 3 The goal is to design a program that process a list of tweets and generates a word cloud. This is the incomplete code

Python 3

The goal is to design a program that process a list of tweets and generates a word cloud.

This is the incomplete code to process the tweets:

from string import ascii_letters

def clean_line(line): ''' Eliminates all non-alphabetical characters, except ' ', '@'' and '#', from the line. This function could also lower case the input line. >>> clean_line("This is a #hashtag!, this a is a Number123 @userName") 'this is a #hashtag this a is a number @username''''

# your code here return line

def get_tweet_text(line): ''' Receives a line from the tweets file and get the tweet text from it and returns it as a string. The line has the following structure: source, text,created_at, retweet_count, favorite_count, is_retweet,id_str

>>> get_tweet_text("Twitter,Thank you!,11-15-2017 10:58:18,96,433,false,9307")

'Thank you!' ''' # your code here return line

def read_stopwords(): ''' Read the stopwords from 'stopwords.txt' file and return list with the words ''' # your code here return []

def process_tweet_text(text): ''' Receives the tweet text and process it: - eliminates any non-alpabetical character, except ' ', '@'' and '#' - convert it to lowercase - separates it in words - filter out all the words which are stopwords - return the remaining words as a list >>> process_tweet_text("this is a #hashtag this a is a number @username") ['#hashtag', 'number', '@username']

In the previous output: {this, is, a}, are stopwords, so they are removed. ''' stopwords = read_stopwords() words = clean_line(text).split() result = [] # your code here return result

def process_tweet_file(file_name): ''' Receives the name of file contanining tweets. Process it to get all the tweet texts. Extract the words from it and count their frequencies. Return the result as a dictionary, where the key are the words and the values are the word frequencies.''' word_freqs = {} with open(file_name) as tweets: for line in tweets: text = get_tweet_text(line) words = process_tweet_text(text) # your code here return word_freqs

def print_statistics(word_freqs): ''' Receives a dictionary with word frequencies and print statistics. ''' # your code here print('The total number of words is:', tot_num_of_words) print('The total number of different words is:', tot_num_of_different_words) print('The most frequent word is:', most_frequent_word) print('With a frequency of:', most_frequent_word_freq)

def write_words(word_freqs, file_name): ''' Write down the words along with their frequencies, one word per line with the word and the frequency separated by a space: Ex. great 484 fabulous 200''' # your code here

wf = process_tweet_file('tweets.txt') print_statistics(wf) write_words(wf, 'words.txt')

These are extra files that may be needed to verify accuracy:

https://www.dropbox.com/s/pipeiwypvm516u0/stopwords.txt?dl=0

https://www.dropbox.com/s/rfwegnv71n054m7/tweets.txt?dl=0

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!