Question: Here is the assignment. Using Python 3 and Mongo DB Semi-structured Data Processing The main outline of your assignment is to write a program that

Here is the assignment. Using Python 3 and Mongo DB

Semi-structured Data Processing

The main outline of your assignment is to write a program that will read in JSON formatted data from a Mongo DB collection or from a file. This will be in a format that is structured with lines of data representing one type of unit, for example, one tweet for Twitter or one post from Facebook. Your program will contain the data as lists of json structures, which are just python dictionaries and lists. Your program may also contain pandas dataframes for processed data. The program will do some processing to collect data from some of the fields that will answer one or more questions, as described below, and write a file with the data suitable for answering each question. Remember that some fields may be optional or have null values, so you may need to test for those conditions. Graphing is definitely optional. Questions: Types of questions: process one collection of data and summarize information from a number of fields. This is similar to the example programs for Twitter hashtags or Facebook counts but must access different and more fields than in those examples. process one collection of data and separate it into different categories and give some summary statistics on those categories. For example, bin the tweets by day or by hour and report on the number of tweets per day or hour. process two or more collections of data and compare some summary data about the two collections

Here is the code I have created thus far. I am looking to keep how I imported the data the same. What i need help with is the questions I want to answer. 1. Who had more retweets Packers or Vikings 2. Were more people tweeting in Minnesota, Greenbay, or neither (using the location field)

Below was the searches I ran to get the data in an Anaconda Prompt:

C:\Users\vfowl\programsvf>python run_twitter_simple_search_save.py #packers 200 valerie packers

C:\Users\vfowl\programsvf>python run_twitter_simple_search_save.py #vikings 200 valerie vikings

C:\Users\vfowl\programsvf>python run_twitter_simple_search_save.py #SundayNightFootball 200 valerie snf

Using Spyder:

import pymongo

client = pymongo.MongoClient('localhost', 27017) client.database_names()

valerie = client.valerie valerie.collection_names()

#Greenbay Packers gb_ftbll = valerie.packers tweets = gb_ftbll.find() gb_pack = [tweet for tweet in tweets] len(gb_pack)

def print_tweet_data(tweets): for tweet in tweets: print(' Date:', tweet['created_at']) print('From:', tweet['user']['name']) print('Message', tweet['text']) if not tweet['place'] is None: print('Place:', tweet['place']['full_name'])

print_tweet_data(gb_pack[:1])

#vikings mn_ftbll = valerie.vikings tweets = mn_ftbll.find() mn_vike = [tweet for tweet in tweets] len(mn_vike) print_tweet_data(mn_vike)

#Sunday Night Football snf_ftbll = valerie.snf tweets = snf_ftbll.find() sunday = [tweet for tweet in tweets] len(sunday) print_tweet_data(sunday)

#Print one tweet from each list to help identify fields print(gb_pack[:1]) print(mn_vike[:1]) print(sunday[:1])

#question 1- Which team had more retweets? #Find how many retweets did each team had then compare from collections import Counter count = 0 def retweet_data(tweets): for tweet in tweets: if count <= tweet['retweet_count']: print ('The number of packers retweets:', tweet['retweet_count']) retweet_data(gb_pack)

***This runs fine but I want it to print the sum of all the retweets, not the entire list of 200 tweets with the Number of retweets

This is what it prints:

The number of packers retweets: 6 The number of packers retweets: 35 The number of packers retweets: 35 The number of packers retweets: 0

I would like it to print

The total number of packers retweets: X

Thanks in advance!

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!