Question: 1. We need to get the data from the file assets/companies_small_set.data into a DataFrame. The problem is that the data on each line of
1. We need to get the data from the file assets/companies_small_set.data into a DataFrame. The problem is that the data on each line of the file is in either a JSON or Tab-separated values (TSV) format. The JSON lines are in the correct format, they just need to be converted to native Python dict s. The TSV lines need to be converted in to dict s that match the JSON format. Write a generator gen_fixed_data that takes an iterator as an arguement. It should parse the values in the iterator and yield each value in the correct format: A dict with the keys: company catch_phrase phone timezone client_count Note that your solution should be a generator function, it should not return a DataFrame. Question 2 The data in assets/server_metrics.csv represents the time it take to handle requests in a start-up company's web application. Let's imagine we are asked to write some code that gives us a DataFrame that just contains the entries where processing_time is greater than 160 milliseconds. We could solve that problem like this... [ ]: df = pd. read_csv ('assets/server_metrics.csv') [] outliers df [df ['processing_time'] > 160] []: mat plotlib inline import matplot lib.pyplot as plt = outliers['processing_time'].plot.hist(title="Times > 160") But imagine that instead of dealing with millions of rows, we have to deal with billions or trillions and the set is too big to fit comfortably in memory, or that the data is coming to us not in a local file, but is being read over the network. Generators can be a nice way to help in that situation. Here is a generator that yields a dict for each line in assets/server_metrics.csv. Note that your solution should be a generator function, it should not return a DataFrame.
Step by Step Solution
3.52 Rating (159 Votes )
There are 3 Steps involved in it
1 ... View full answer
Get step-by-step solutions from verified subject matter experts
