2.1 Preprocessing of restaurant and review data (15 points) Load and preprocess restaurant and customer review...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
2.1 Preprocessing of restaurant and review data (15 points) Load and preprocess restaurant and customer review data that is stored in JSON files. The files store information regarding the restaurants and customer reviews for a very small subset of the Yelp dataset. Look at the format of the data in the two files to understand its structure or read the dataset documentation here. The Python JSON API is documented here. []: import json business = json.load(open('../data/business.json')) reviews = json.load(open('../data/review.json')) # Create the id2business` table that maps the "business_id" to the corresponding restaurant object. id2business = {} # YOUR CODE HERE (5 points) #Create the location2business` table that maps a tuple (city, state) to au list of restaurants at that location. location2business {} # YOUR CODE HERE (5 points) # For each location show the number of restaurants. For example, the largestu number (8) should be in Philadelphia. for location in location2business: print (location, len(location2business [location])) # The "categories" field of a restaurant lists a number of categories, usually related to food served or cuisine. 2 # For each unique category that is in the data, show the total number ofu restaurants. # For example, there are 5 restaurants that are categorized as Mexican. cat2freq 0} # YOUR CODE HERE (5 points) #Print all category --> count mappings. For example, cat2freq['Merican] should be 5. print (cat2freq) 2.1 Preprocessing of restaurant and review data (15 points) Load and preprocess restaurant and customer review data that is stored in JSON files. The files store information regarding the restaurants and customer reviews for a very small subset of the Yelp dataset. Look at the format of the data in the two files to understand its structure or read the dataset documentation here. The Python JSON API is documented here. []: import json business = json.load(open('../data/business.json')) reviews = json.load(open('../data/review.json')) # Create the id2business` table that maps the "business_id" to the corresponding restaurant object. id2business = {} # YOUR CODE HERE (5 points) #Create the location2business` table that maps a tuple (city, state) to au list of restaurants at that location. location2business {} # YOUR CODE HERE (5 points) # For each location show the number of restaurants. For example, the largestu number (8) should be in Philadelphia. for location in location2business: print (location, len(location2business [location])) # The "categories" field of a restaurant lists a number of categories, usually related to food served or cuisine. 2 # For each unique category that is in the data, show the total number ofu restaurants. # For example, there are 5 restaurants that are categorized as Mexican. cat2freq 0} # YOUR CODE HERE (5 points) #Print all category --> count mappings. For example, cat2freq['Merican] should be 5. print (cat2freq)
Expert Answer:
Answer rating: 100% (QA)
The image you provided describes a problem in which JSON files containing restaurant and review data need to be preprocessed The tasks are to create t... View the full answer
Related Book For
Auditing A Practical Approach with Data Analytics
ISBN: 978-1119401742
1st edition
Authors: Raymond N. Johnson, Laura Davis Wiley, Robyn Moroney, Fiona Campbell, Jane Hamilton
Posted Date:
Students also viewed these programming questions
-
Question A) What is the SWEATT model, and why is it essential to an organization? Question B) How can the SWEATT model be applied to the USPS or other organizations? Can you align each of...
-
Why should taxpayers keep records of personal, nonbusiness assets that they own, in a tax sense?
-
Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...
-
Why do we use the absolute value of x or of g(x) in the derivative formulas for the natural logarithm?
-
Just before the firm discussed in Problem 22-4 produced its last tablet device in the previous month, its total costs were $12,425. What was the marginal cost incurred by the firm in producing the...
-
Facebook has a policy on removing hate speech from its site. Over the past year, some Facebook employees have been arguing that Donald Trump's posts about banning Muslims from entering the United...
-
Tyrell has worked his way up from sales associate to department supervisor for a large suburban discount store. Now he has been offered a higher position in Atlanta. Currently, Tyrell earns about...
-
The ledger of Elburn Company at the end of the current year shows Accounts Receivable $110,000, Sales Revenue $840,000, and Sales Returns and Allowances $28,000. Instructions (a) If Elburn uses the...
-
If 10-year T-bonds have a yield of 6.2%, 10-year corporate bonds yield 11.8%, the maturity risk premium on all 10-year bonds is 1.3%, and corporate bonds have a 0.4% liquidity premium versus a zero...
-
In receiving terminals of petrochemical plants, some of the liquid natural gas (LNG) storage Tanks are designed to be in ground, to avoid receiving radiation from the sun and also being protected...
-
8. As the sample size increases, the margin of error increases a. b. decreases c. stays the same d. increases or decreases depending on the size of the mean 9. A sample of 225 elements from a...
-
Cost allocation and cost drivers.Whatis the relationship between theconcept of the cost allocation basis, used in the step-down method,and theconcept of the cost driver, used inactivity-based costing...
-
Consider two perfectly negatively correlated risky securities, X and Y.Security X has an expected rate of return of 8% and a standard deviation of return of 22%.Y has an expected rate of return of...
-
The duration of $ 1 , 0 0 0 1 5 - year annual 7 % coupon bond, trading on a YTM of 7 % is 9 . 5 8 7 years. The convexity is 1 5 2 . 4 5 . Using the duration approximation with convexity adjustment,...
-
Your company incurred actual overhead costs during the last fiscal year totaling $225,000.However, the applied overhead was $240,000.The balance in the various accounts at the end of the year is:...
-
explain the molecular mechanisms underlying gametogenesis, including meiotic cell division, chromosome segregation, and gamete maturation, and how do genetic and epigenetic factors influence gamete...
-
What does an underwriter typically require from an accountant which indicates that the company has fulfilled all the accounting requirements in the registration process? A) An audit opinion B) A...
-
What are the risks and liability factors in an audit? What are the implications to the auditor? What are the implications to the organization? How can the auditor mitigate these risks and liability...
-
Katrina Ellis is the engagement partner of the audit of Champion Securities, an investment company. Most of Champions assets and liabilities are financial and their valuation is critical to the...
-
The primary objective of a CPAs observation of a clients physical inventory count is to: a. Discover whether a client has counted a particular inventory item or group of items. b. Obtain direct...
-
Martin Rorke is reviewing the results of the review of subsequent cash receipts. There are several receipts listed from customers that were considered doubtful at the end of the year (December 31)....
-
At http://www.eeoc.gov/facts/accommodation.html, the EEOC offers advice to small employers on how to comply with some aspects of the Americans with Disabilities Act. Employers sometimes complain that...
-
Question: Union workers are striking at Cheesey, a restaurant, forming picket lines in front of the restaurant during the lunch and dinner hours but at no other times. The union members chant slogans...
-
Question: Concrete Company was bargaining a CBA with the drivers' union. Negotia- tions went on for many months. Concrete made its final offer of $9.50 per hour, with step increases of $0.75 per hour...
Study smarter with the SolutionInn App