Question: Python 3 The goal is to design a program that process a list of tweets and generates a word cloud. This is the incomplete code

Python 3

The goal is to design a program that process a list of tweets and generates a word cloud.

This is the incomplete code to process the tweets:

from string import ascii_letters

def clean_line(line): ''' Eliminates all non-alphabetical characters, except ' ', '@'' and '#', from the line. This function could also lower case the input line. >>> clean_line("This is a #hashtag!, this a is a Number123 @userName") 'this is a #hashtag this a is a number @username''''

# your code here return line

def get_tweet_text(line): ''' Receives a line from the tweets file and get the tweet text from it and returns it as a string. The line has the following structure: source, text,created_at, retweet_count, favorite_count, is_retweet,id_str

>>> get_tweet_text("Twitter,Thank you!,11-15-2017 10:58:18,96,433,false,9307")

'Thank you!' ''' # your code here return line

def read_stopwords(): ''' Read the stopwords from 'stopwords.txt' file and return list with the words ''' # your code here return []

def process_tweet_text(text): ''' Receives the tweet text and process it: - eliminates any non-alpabetical character, except ' ', '@'' and '#' - convert it to lowercase - separates it in words - filter out all the words which are stopwords - return the remaining words as a list >>> process_tweet_text("this is a #hashtag this a is a number @username") ['#hashtag', 'number', '@username']

In the previous output: {this, is, a}, are stopwords, so they are removed. ''' stopwords = read_stopwords() words = clean_line(text).split() result = [] # your code here return result

def process_tweet_file(file_name): ''' Receives the name of file contanining tweets. Process it to get all the tweet texts. Extract the words from it and count their frequencies. Return the result as a dictionary, where the key are the words and the values are the word frequencies.''' word_freqs = {} with open(file_name) as tweets: for line in tweets: text = get_tweet_text(line) words = process_tweet_text(text) # your code here return word_freqs

def print_statistics(word_freqs): ''' Receives a dictionary with word frequencies and print statistics. ''' # your code here print('The total number of words is:', tot_num_of_words) print('The total number of different words is:', tot_num_of_different_words) print('The most frequent word is:', most_frequent_word) print('With a frequency of:', most_frequent_word_freq)

def write_words(word_freqs, file_name): ''' Write down the words along with their frequencies, one word per line with the word and the frequency separated by a space: Ex. great 484 fabulous 200''' # your code here

wf = process_tweet_file('tweets.txt') print_statistics(wf) write_words(wf, 'words.txt')

These are extra files that may be needed to verify accuracy:

https://www.dropbox.com/s/pipeiwypvm516u0/stopwords.txt?dl=0

https://www.dropbox.com/s/rfwegnv71n054m7/tweets.txt?dl=0

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Computer Science :You are going to create an interactive educational app based on your chosen topic. It might be a game, a quiz, or a combination of options for your user to work through in a...

You are going to create an interactive educational app based on your chosen topic. It might be a game, a quiz, or a combination of options for your user to work through in a graphical environment...

This is Python 3 meant to be done on JetBrains PyCharm Community Edition 2017.3.3 Please help I have the files that are mentioned in the PDF. I will email them to you if you can help me with this....

Use Python Programing The goal of this project is to 1) gain experience using pieces of code written by someone other than yourself, 2) test your new top-down design skills in breaking a large...

This text was adapted by The Saylor Foundation under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License without attribution as requested by the work's original creator or licensee. 1...

Mates Rates Rent-A-Car ( just do the part a) using visual studio code (C#) Criteria sheet - Par A Example supplementary files (readme.pdf) Example supplementary files (class-diagram.pdf) Assignment...

Consider the following 3 systems of equations, a. System 1 Variables u, v, w, and y b. System 2 Variables u, u, and w c. System 3 Variables u, v, w and y 2u+5v-w+y=5 -4w+y=2 v-w=2 3v+w-y=4 2u+10v2w...

Typically, productivity of a process, line, division or an entire company, or even a country is expressed as: A. output times input. B. output divided by input.C. output minus input. D. output plus...

A convenience store manager suspects a cashier of skimming cash. The manager cannot prove it through video footage and wants to substantiate the skimming suspicions. Which procedure should the...

Draw the products of the reaction Choose the best reagents to complete shown below. Use wedge and dash the reaction shown below. bonds to indicate stereochemistry. Please select a drawing or reagent...

Explain the difference between Job Analysis, Job Classification, and Job Evaluation.

What does Processing of an OLAP Cube accomplish?

After designing a Multidimensional Database in Visual Studio, what are the next steps that build the Database in the Analysis Services Instance? How is the build out of the Analytical Services...