Question: In this project you will develop tools for performing sentiment analysis on a database of tweets from across the country. When the project is complete

In this project you will develop tools for performing sentiment analysis on a database of tweets from across the country. When the project is complete you should be able to estimate the sentiment of tweets filtered by content.

There are 4 files provided here:

all_tweets.txt is the large collection of tweets

some_tweets.txt is a subset of all_tweets that's more manageable to prototype on

sentiments.csv a csv with word sentiment values

zips.csv (not required, see below)

We will go over the format of each of these files in class.

Tweets: We will represent a tweet using a Python dictionary with the following entries:

text: a string, the text of the tweet all in lowercase

time: a datetime object, date and time of the tweet

latitude: a float, the latitude of the tweet's location

longitude: a float, the longitude of the tweet's location

Problem 1 Write a function called make_tweets that takes as input a file name and returns a list of dictionaries. Each dictionary corresponds to a tweet.

In [ ]:

def make_tweets(filename): #your code here

Problem 2 Write a function add_sentiment to determine the sentiment of each tweet by taking the average sentiment over all of the words in the tweet. The function should return a new list of tweets where each tweet has a new key 'sentiment' with a numeric value between -1 and 1, or None representing the sentiment of the tweet. Note: words without a sentiment do not have sentiment 0. Your function should take as input a list of tweets (dictionaries) together with the name of the sentiment file. Be careful that your function does not alter the original list (no side effects!)

In [ ]:

def add_sentiment(tweets,filename): #your code here

Problem 3 Write a function called tweet_filter that will return a new list of tweets filtered by the content of the tweet text. The input for this function should be a list of tweets and a list of words (strings). The function should return a list of tweets that each include all of the words in the word list ignoring case and punctuation. Note: Since you are not changing the tweets, as long as the returned list is new, you don't have to worry about side-effects on the tweets here.

In [ ]:

def tweet_filter(tweets, words): #your code here

Problem 4 Use your work above and below to answer the following questions:

What is the average sentiment of tweets containing the word 'beer'

What is the average sentiment of tweets containing the word 'coffee'

Consider the average sentiment of the tweets containing at least one of the words 'beer', 'movie', coffee', 'work'. Which word leads to a list of tweets with the lowest average sentiment?

In [ ]:

# Include the code you wrote for Problem 4 here

Write the answers to problem 4 here:

Average sentiment:

Lowest average sentiment:

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Please answer questions 2 and 4: Please use python! In this project, you will develop tools for performing sentiment analysis on a database of tweets from across the country. When the project is...

Samples of the files can be found here: http://www.cs.columbia.edu/~cannon/tweet_data/ Please Use Python to Answer these questions separately! In this project you will develop tools for performing...

Please answer questions 2 and 4: Please use python! In this project, you will develop tools for performing sentiment analysis on a database of tweets from across the country. When the project is...

PLEASE USE PYTHON In this project you will develop tools for performing sentiment analysis on a database of tweets from across the country. When the project is complete you should be able to estimate...

ead the Harvard case study on Procter and Gamble. Then, answer the following questions. 1. What is business analytics? Why is it important to make real-time decisions at P&G? What are the benefits of...

You need to make business plan for music company. company name is mg3. you can make plan according to you im attaching example of plan of another company, you can get idea from that example. do like...

contributed articles DOI:10.1145/ 2602574 How to use, and influence, consumer social communications to improve business performance, reputation, and profit. BY WEIGUO FAN AND MICHAEL D. GORDON The...

you need to business plan on music company , company name is MG3 . you can do according to you just make plan , business for music companu i attached pictures of another company , you have to do same...

contributed articles DOI:10.1145/ 2602574 How to use, and influence, consumer social communications to improve business performance, reputation, and profit. BY WEIGUO FAN AND MICHAEL D. GORDON The...

INTERNATIONAL REVIEW OF L AW C OMPUTERS & TECHNOLOGY , VOLUME 11, N UMBER 2, P AGES 251-261, 1997 The Data Mart: A New Approach to Data Warehousing PAM ELA PIPE Introduction Vendors have recently...

Evaluate Ronens financial statement insurance proposal.

College athletes, especially males, are often perceived as having very little interest in the academic side of their college experience. One common problem is class attendance. To address the problem...

Valdez received a distribution from a tradnional 4 0 1 M ) account this yeac in which of the following stuations will Valdez be subject fo an early distribution penalty? Mulluple Cheice Veidec is 6 2...

Seved Help 14 Wisconsin Snowmobile Corp. is considering a switch to level production Cost efficiencies would occur under level production, and aftertax costs would decline by $31,500, but inventory...

How do Dimensional Database Models differ from Relational Models?

What type of processing do Relational Databases support?

Describe several aggregation operators.