Question: We wish to separate this into smaller strings by splitting it at commas. We will then coerce each resulting token into the appropriate data type

We wish to separate this into smaller strings by splitting it at commas. We will then coerce each resulting token into the
appropriate data type according to the following schema:
test_schema =[int, int, str, str, float]
The function call process_line(test_line, test_schema, ',') should return the following list:
[6,174, 'Blah', 'Hello World', 7.37]
Write a function named process_line() that accepts three parameters line, schema, and sep as described above.
The function should perform the following steps:
1. Use the split() method to split the string line on the character provided by sep. Store the resulting list in a
variable named tokens.
2. Create an empty list named result.
3. Simultaneously loop over the elements of the lists tokens and schema (which are intended to have the same
number of elements). Each time the loop executes, perform the following steps:
a. Store the current element of tokens in a variable named t.
b. Store the current element of schema in a variable named dt.
c. Note that dt will contains a data type, while t will hold a string. Coerce t into a value with the desired
data type using dt(t). Append the converted value into the list result.
4. Return result.
We will apply this function in the next example, but we will first test it to make sure that it is working correctly.
In a new code cell, create the following objects:
test_schema =[int, int, str, str, float]
test_line ='6,174,Blah,Hello World,7.37'
Use the process_line() function to split the string test_line at the commas, coercing the individual tokens into
the data types stored in test_schema. Print the result.
3
Problem 3: Processing File Input
In this problem, you will write a function named read_file_to_list() that will read rows of text from a data file,
tokenize the rows into individual values, coerce those values into the proper data types, and then return the results in the
form of a list of lists.
The function should accept three parameters named path, schema, and sep.
path should be a string representing the path to a data file.
schema should be a list of data types indicating the desired types for the columns stored in the data file.
sep should be a string indicating the character used to separate values stored in the lines of the data file.
We will assume that the first line of the data files used will contain header information that assigns a name to each column.
This line will be tokenized (split), but the values will be left as strings, rather than being coerced according to schema.
As an example, assume that a datafile located at path contains the following lines of text:
Name,Age,PayRate
Anna,27,15.25
Bradley,31,16.75
Catherine,23,15.50
Define my_schema =[str, int, float]. Then the function call read_file_to_list(path, my_schema, ',')
should return the following list of lists:
[['Name', 'Age', 'PayRate'],['Anna',27,15.25],['Bradley',31,16.75],['Catherine',23,15.5]]
Write a function named read_file_to_list() that accepts three parameters path, schema, and sep as described above.
The function should perform the following steps:
1. Use with, open(), and read() to read the contents of the file into a string named contents.
2. Use split() to separate the string into a list named lines by splitting on the newline character.
3. Create an empty list named data. This will eventually contain the list that is to be returned.
4. The first line contains header information and will be processed differently from the other lines. Split the first
string in lines on sep. Store the resulting list into the list data.
5. We no longer need the first element of lines. For convenience, you may delete it.
6. Loop over the remaining elements of lines. Each time the loop executes, use the function process_lines()
to process the current line, appending the resulting list into data. Use the values provided to the parameters
schema and sep.
7. Return data.
We will now test the read_file_to_list() function on two small data files. We will start with a data file that contains
10 observations from the Diamonds Dataset. Make sure that the file diamonds_partial.txt is in the same directory as
your notebook. I suggest opening this file so that you can see what its contents look like.
Use read_file_to_list() to read the contends of the file diamonds_partial.txt. Individuals values within the
rows of this data file are separated by commas. The datatypes for the columns in this data set are given by the following
schema: [float, str, str, str, int]. Store the resulting list in a variable named diamond_data. Use a loop to
print the lists contained in diamond_data with each list appearing on its own line.
We will now test our function with a file containing 10 observations from the Titanic Dataset. Make sure that the file
titanic_partial.txt is in the same directory as your notebook. I suggest taking a look at the contents of this file.
Use read_file_to_list() to read the contends of the file titanic_partial.txt. Individual values

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Finance Questions!