Question: Python 3 The goal is to design a program that process a list of tweets and generates a word cloud. This is the incomplete code
Python 3
The goal is to design a program that process a list of tweets and generates a word cloud.
This is the incomplete code to process the tweets:
from string import ascii_letters
def clean_line(line): ''' Eliminates all non-alphabetical characters, except ' ', '@'' and '#', from the line. This function could also lower case the input line. >>> clean_line("This is a #hashtag!, this a is a Number123 @userName") 'this is a #hashtag this a is a number @username''''
# your code here return line
def get_tweet_text(line): ''' Receives a line from the tweets file and get the tweet text from it and returns it as a string. The line has the following structure: source, text,created_at, retweet_count, favorite_count, is_retweet,id_str
>>> get_tweet_text("Twitter,Thank you!,11-15-2017 10:58:18,96,433,false,9307")
'Thank you!' ''' # your code here return line
def read_stopwords(): ''' Read the stopwords from 'stopwords.txt' file and return list with the words ''' # your code here return []
def process_tweet_text(text): ''' Receives the tweet text and process it: - eliminates any non-alpabetical character, except ' ', '@'' and '#' - convert it to lowercase - separates it in words - filter out all the words which are stopwords - return the remaining words as a list >>> process_tweet_text("this is a #hashtag this a is a number @username") ['#hashtag', 'number', '@username']
In the previous output: {this, is, a}, are stopwords, so they are removed. ''' stopwords = read_stopwords() words = clean_line(text).split() result = [] # your code here return result
def process_tweet_file(file_name): ''' Receives the name of file contanining tweets. Process it to get all the tweet texts. Extract the words from it and count their frequencies. Return the result as a dictionary, where the key are the words and the values are the word frequencies.''' word_freqs = {} with open(file_name) as tweets: for line in tweets: text = get_tweet_text(line) words = process_tweet_text(text) # your code here return word_freqs
def print_statistics(word_freqs): ''' Receives a dictionary with word frequencies and print statistics. ''' # your code here print('The total number of words is:', tot_num_of_words) print('The total number of different words is:', tot_num_of_different_words) print('The most frequent word is:', most_frequent_word) print('With a frequency of:', most_frequent_word_freq)
def write_words(word_freqs, file_name): ''' Write down the words along with their frequencies, one word per line with the word and the frequency separated by a space: Ex. great 484 fabulous 200''' # your code here
wf = process_tweet_file('tweets.txt') print_statistics(wf) write_words(wf, 'words.txt')
These are extra files that may be needed to verify accuracy:
https://www.dropbox.com/s/pipeiwypvm516u0/stopwords.txt?dl=0
https://www.dropbox.com/s/rfwegnv71n054m7/tweets.txt?dl=0
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
