2.1 Preprocessing of restaurant and review data (15 points) Load and preprocess restaurant and customer review...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
2.1 Preprocessing of restaurant and review data (15 points) Load and preprocess restaurant and customer review data that is stored in JSON files. The files store information regarding the restaurants and customer reviews for a very small subset of the Yelp dataset. Look at the format of the data in the two files to understand its structure or read the dataset documentation here. The Python JSON API is documented here. []: import json business = json.load(open('../data/business.json')) reviews = json.load(open('../data/review.json')) # Create the id2business` table that maps the "business_id" to the corresponding restaurant object. id2business = {} # YOUR CODE HERE (5 points) #Create the location2business` table that maps a tuple (city, state) to au list of restaurants at that location. location2business {} # YOUR CODE HERE (5 points) # For each location show the number of restaurants. For example, the largestu number (8) should be in Philadelphia. for location in location2business: print (location, len(location2business [location])) # The "categories" field of a restaurant lists a number of categories, usually related to food served or cuisine. 2 # For each unique category that is in the data, show the total number ofu restaurants. # For example, there are 5 restaurants that are categorized as Mexican. cat2freq 0} # YOUR CODE HERE (5 points) #Print all category --> count mappings. For example, cat2freq['Merican] should be 5. print (cat2freq) 2.1 Preprocessing of restaurant and review data (15 points) Load and preprocess restaurant and customer review data that is stored in JSON files. The files store information regarding the restaurants and customer reviews for a very small subset of the Yelp dataset. Look at the format of the data in the two files to understand its structure or read the dataset documentation here. The Python JSON API is documented here. []: import json business = json.load(open('../data/business.json')) reviews = json.load(open('../data/review.json')) # Create the id2business` table that maps the "business_id" to the corresponding restaurant object. id2business = {} # YOUR CODE HERE (5 points) #Create the location2business` table that maps a tuple (city, state) to au list of restaurants at that location. location2business {} # YOUR CODE HERE (5 points) # For each location show the number of restaurants. For example, the largestu number (8) should be in Philadelphia. for location in location2business: print (location, len(location2business [location])) # The "categories" field of a restaurant lists a number of categories, usually related to food served or cuisine. 2 # For each unique category that is in the data, show the total number ofu restaurants. # For example, there are 5 restaurants that are categorized as Mexican. cat2freq 0} # YOUR CODE HERE (5 points) #Print all category --> count mappings. For example, cat2freq['Merican] should be 5. print (cat2freq)
Expert Answer:
Answer rating: 100% (QA)
The image you provided describes a problem in which JSON files containing restaurant and review data need to be preprocessed The tasks are to create t... View the full answer
Related Book For
Auditing A Practical Approach with Data Analytics
ISBN: 978-1119401742
1st edition
Authors: Raymond N. Johnson, Laura Davis Wiley, Robyn Moroney, Fiona Campbell, Jane Hamilton
Posted Date:
Students also viewed these programming questions
-
Question A) What is the SWEATT model, and why is it essential to an organization? Question B) How can the SWEATT model be applied to the USPS or other organizations? Can you align each of...
-
Why should taxpayers keep records of personal, nonbusiness assets that they own, in a tax sense?
-
Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...
-
Why do we use the absolute value of x or of g(x) in the derivative formulas for the natural logarithm?
-
Consider a satellite at the altitude of geostationary satellites but whose orbital plane is inclined to the equatorial plane by an angle. To a stationary user on the earth's surface at north...
-
How can the matrix for , the complement of the relation R, be found from the matrix representing R, when R is a relation on a finite set A?
-
The gas mixture in a helium-neon laser (end mirrors removed) emits light at \(633 \mathrm{~nm}\) with a Doppler-broadened spectral width of about \(1.5 \times 10^{9} \mathrm{~Hz}\). Calculate the...
-
Lauras Dress Delivery operates a mail-order business that sells clothes designed for frequent travelers. It had sales of $800,000 in December. Because Lauras Dress Delivery is in the mail-order...
-
Using a recent (September 2023- present) Microeconomics related article in the media (the Economist, Globe and Mail, New York Times, etc.), attempt to explain parts or all of it using concepts in...
-
You are assessing internal control in the audit of the payroll and personnel cycle for Rogers Products Company, a manufacturing company specializing in assembling computer parts. Rogers employs...
-
The following information is available for Whynot Ltd. For its year ended December 31,20X1: I) Net income before tax = $1,800,000. II) The net income before tax include $20,000 of dividends received...
-
What does Weber say about jobs in an ideal bureaucracy? They are well-defined and workers become highly skilled at performing them. They offer the workers opportunities for empowerment. They satisfy...
-
Several government department employees who regularly meet unemployed citizens say they experience considerable stress. The problem, they claim, is that they try to act pleasant and sympathetic to...
-
In a period of fluctuating prices, which method of inventory costing would provide the most consistent net income?
-
Solve the equation by factoring: r2_ 5x 6=0
-
Identify the sources that can be used to develop an expectation for an analytical procedure.
-
Slip of the induction machine is 0.02 and the stator supply frequency is 50 Hz. What will be the frequency of the rotor induced emf? (A) 10 Hz. (B) 50 Hz. (C) 1 Hz. (D) 2500 Hz.
-
What is an access control list?
-
Katrina Ellis is the engagement partner of the audit of Champion Securities, an investment company. Most of Champions assets and liabilities are financial and their valuation is critical to the...
-
The primary objective of a CPAs observation of a clients physical inventory count is to: a. Discover whether a client has counted a particular inventory item or group of items. b. Obtain direct...
-
Martin Rorke is reviewing the results of the review of subsequent cash receipts. There are several receipts listed from customers that were considered doubtful at the end of the year (December 31)....
-
A dimensionless grouping of variables and parameters that are important in pipe flow is called the Reynolds number: where $V$ is the average velocity over the cross section in a pipe of diameter $D$...
-
A layer of water is flowing down a flat plate that is inclined at an angle of $20^{\circ}$ to the vertical. If the depth of the layer is $1 / 4 \mathrm{in}$., what is the shear stress exerted by the...
-
Basic Concepts by considering the simple one-dimensional steady flow between two parallel plates, Figure 1.1. The area of each plate $A_{y}=1 \mathrm{~m}^{2}$ and the gap between the two plates is...
Study smarter with the SolutionInn App