Question: We wish to separate this into smaller strings by splitting it at commas. We will then coerce each resulting token into the appropriate data type
We wish to separate this into smaller strings by splitting it at commas. We will then coerce each resulting token into the
appropriate data type according to the following schema:
testschema int int, str str float
The function call processlinetestline, testschema, should return the following list:
'Blah', 'Hello World',
Write a function named processline that accepts three parameters line, schema, and sep as described above.
The function should perform the following steps:
Use the split method to split the string line on the character provided by sep. Store the resulting list in a
variable named tokens.
Create an empty list named result.
Simultaneously loop over the elements of the lists tokens and schema which are intended to have the same
number of elements Each time the loop executes, perform the following steps:
a Store the current element of tokens in a variable named t
b Store the current element of schema in a variable named dt
c Note that dt will contains a data type, while t will hold a string. Coerce t into a value with the desired
data type using dtt Append the converted value into the list result.
Return result.
We will apply this function in the next example, but we will first test it to make sure that it is working correctly.
In a new code cell, create the following objects:
testschema int int, str str float
testline Blah,Hello World,
Use the processline function to split the string testline at the commas, coercing the individual tokens into
the data types stored in testschema. Print the result.
Problem : Processing File Input
In this problem, you will write a function named readfiletolist that will read rows of text from a data file,
tokenize the rows into individual values, coerce those values into the proper data types, and then return the results in the
form of a list of lists.
The function should accept three parameters named path, schema, and sep.
path should be a string representing the path to a data file.
schema should be a list of data types indicating the desired types for the columns stored in the data file.
sep should be a string indicating the character used to separate values stored in the lines of the data file.
We will assume that the first line of the data files used will contain header information that assigns a name to each column.
This line will be tokenized split but the values will be left as strings, rather than being coerced according to schema.
As an example, assume that a datafile located at path contains the following lines of text:
Name,Age,PayRate
Anna,
Bradley,
Catherine,
Define myschema str int, float Then the function call readfiletolistpath myschema,
should return the following list of lists:
Name 'Age', 'PayRate'AnnaBradleyCatherine
Write a function named readfiletolist that accepts three parameters path, schema, and sep as described above.
The function should perform the following steps:
Use with, open and read to read the contents of the file into a string named contents.
Use split to separate the string into a list named lines by splitting on the newline character.
Create an empty list named data. This will eventually contain the list that is to be returned.
The first line contains header information and will be processed differently from the other lines. Split the first
string in lines on sep. Store the resulting list into the list data.
We no longer need the first element of lines. For convenience, you may delete it
Loop over the remaining elements of lines. Each time the loop executes, use the function processlines
to process the current line, appending the resulting list into data. Use the values provided to the parameters
schema and sep.
Return data.
We will now test the readfiletolist function on two small data files. We will start with a data file that contains
observations from the Diamonds Dataset. Make sure that the file diamondspartial.txt is in the same directory as
your notebook. I suggest opening this file so that you can see what its contents look like.
Use readfiletolist to read the contends of the file diamondspartial.txt Individuals values within the
rows of this data file are separated by commas. The datatypes for the columns in this data set are given by the following
schema: float str str str int Store the resulting list in a variable named diamonddata. Use a loop to
print the lists contained in diamonddata with each list appearing on its own line.
We will now test our function with a file containing observations from the Titanic Dataset. Make sure that the file
titanicpartial.txt is in the same directory as your notebook. I suggest taking a look at the contents of this file.
Use readfiletolist to read the contends of the file titanicpartial.txt Individual values
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
