Question: Using Python: Since I can't attach the data set, I will explain what it is: News articles for each day (Jan 1, 2012 Dec 31,

Using Python:

Since I can't attach the data set, I will explain what it is:

News articles for each day (Jan 1, 2012 Dec 31, 2012) total 366 files
Every file:
1. All the articles published in each day
2. Each article is in a new line
3. Each article is in json format
4. Json Fields for each article:
  1. 'city', 'code', 'title', 'text', 'source', date'

Instructions using Python:

Read the data (all the files in the data directory) using the function textFile
Take only the text part of each article and count the frequency of all the words (convert the text into lowercase)
Remove (Filter) any word whose frequency is less than 10
Report the following:
1. Total size of the output data (after the filtering)
2. Frequency of the following words congress, london, washington, football
3. The word with maximum frequency for each month (hint: to read only a months articles you can use *. E.g. for February 2012-02* represents all files starting with 2012-02,i.e. files belonging to Feb)
4. List of words appeared on 2012-09-01 but not on 2012-08-01
5. Frequency of the word monsoon for all months

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Q:

i need a solution of computional problems i really need help on those. Chapter 15Leases MULTIPLE CHOICE 1. Generally accepted accounting principles require that certain lease agreements be accounted...

Q:

.I SELECTED ! Schedule D:Net Long-Term Capital Gains or Losses only its a team work. I need in 3-4 just just scehdule D I. Title: Preparation of a Corporate Income Tax Return Using IRS Form 1120 II....

Q:

Opening Statement for Defense The 1948 U.S. Presidential election was a hard fight for Harry Truman. No one thought he had a shot and the folks at the Chicago Daily Tribune didn't think so either. At...

Q:

CASE 11 Netflix in 2012: Can It Recover from Its Strategy Missteps? Arthur A. Thompson The University of Alabama hroughout 2010 and the first six months of 2011, Netflix was on a roll. Movie...

Q:

B Excerpts from Chipotle Mexican Grill, Inc. Form 10-K for the Fiscal Year Ended December 31, 2015 UNITED STATES SECURITIES AND EXCHANGE COMMISSION WASHINGTON, D.C. 20549 FORM 10-K ANNUAL REPORT...

Q:

Micro Tiles Limited In June of 2014, Bill Madden considered expanding his micro tiles business. He planned to invest $2.0 million in 2016 to produce more and better-quality plaster or metal tiles and...

Q:

BBUS 4701: Business Policy and Strategy Yahoo 2012 Annual Report Assignment 4 By Yahoo! Yahoo!. (2012). Yahoo 2012 Annual Report. Retrieved from: http://investor.yahoo.net/annuals.cfm 2 01 2 A N N UA...

Q:

1. Calculate the cost of each capital component, after-tax cost of debt, cost of preferred, and cost of equity with the DCF method and CAPM method. 2. What do you estimate the company?s WACC? Please...

Q:

Files 12:56 PM Mon May 16 25% Unit+2+workbook Q . . . Home Insert Draw Layout Review View Calibri Bold (Body) 36 B I U aAv A v Unit 2-Supply and Demand Name: HR: LAW OF SUPPLY Use the space below to...

Q:

it has to be a paragraph and you have to choose the best and why Compare FedEx and UPS in the period covered by this case. Which company do you feel had the best value maximization performance and...

Q:

5-Fluorouracil-2-deoxyriboside (FUdR) is used in medicine as an antiviral and antitumor agent. From its name, draw its structure. How do you think FUdR performs its function?

Q:

Consider again the relationship between the sales and profits of Fortune 500 companies that you analyzed in Exercise 48. In Exercise 48 a) Find a 95% confidence interval for the slope of the...

Q:

Exercise 8-45 Special-Order Decision Refer to the information for Cool Can Company provided in the set up box and in Exercise 8-44. However, assume that Cool Can plans to produce and sell 70,000...

Q:

The seller of a business often wants to include the value of the business' reputation into the purchase price. This value is known as what? Fair Market Value Entity Assets Goodwill

Q:

Discuss how training, informal learning, and knowledge management can contribute to continuous learning and companies business strategy. page 265

Q:

Conduct a needs assessment. page 269

Q:

Explain the recruiters role in the recruitment process, the limits the recruiter faces, and the opportunities available. page 212

Recommended Textbook

More Books

Database Systems Design Implementation And Management

Authors: Peter Robb,Carlos Coronel

5th Edition

061906269X, 9780619062699

Ask a Question and Get Instant Help!