Question: PYTHON3; DO NOT IMPORT ANY PACKAGES Please read carefully Suppose we have access to many data files in txt format, where each line contains comma-separated
PYTHON3; DO NOT IMPORT ANY PACKAGES
Please read carefully
Suppose we have access to many data files in txt format, where each line contains comma-separated items. The first item in each line is the key of this line, and other items are the values of this line. In order to load the file to python, we want to process each line to a tuple of (key, values), and all values should be wrapped in a tuple. For example:
| key1,value1,value2,value3 => ( key1, (value1, value2, value3) ) |
After defining the processing rule, we want to create a generator that traverses through the file (with the given path), processes each line with the rule we just defined, and yields the processed tuple. However, our generator should only yield the first appearance of each key. In other words, if any key has already been yielded, our generator should skip all new data tuples with the same key.
Notes:
-
You can assume that each line will have at least two items (one key and at least one value). However, you should not assume that all lines will have the same number of items.
-
You can assume that the path argument will always point to a valid data file.
-
We have provided infiles/data{1,2}.txt to run the provided doctest. You can read these txt files and their corresponding expected outputs in the doctest to help understand this question.
Hint:
You can store keys in a container within a generator! Think about what kind of container can help you to identify which key has already been yielded the best.
def unique_data_generator(path):
"""
>>> gen1 = unique_data_generator("infiles/data1.txt")
>>> [next(gen1, None) for _ in range(3)]
[('key1', ('val1', 'val2', 'val3')), ('key2', ('val1', 'val2')), \
('key3', ('val1', 'val2', 'val3', 'val4'))]
>>> [next(gen1, None) for _ in range(5)]
[('key4', ('val4', 'val5', 'val6')), \
('key5', ('val3', 'val4', 'val5', 'val6', 'val7')), None, None, None]
>>> gen2 = unique_data_generator("infiles/data2.txt")
>>> [next(gen2, None) for _ in range(5)]
[('Colin', ('02-08-2021', '120')), \
('James', ('02-08-2021', '100')), (Yuri, ('02-09-2021', '115')), \
('Michelle', ('02-09-2021', '120')), ('Sean', ('02-10-2021', '150'))]
"""
# YOUR CODE GOES HERE #
Text in files data1 and data2
Data1:
key1,val1,val2,val3 key1,val1,val2 key2,val1,val2 key2,val4,val5 key3,val1,val2,val3,val4 key4,val4,val5,val6 key5,val3,val4,val5,val6,val7
Data2:
Colin,02-08-2021,120 James,02-08-2021,100 Colin,02-09-2021,10 Yuri,02-09-2021,115 James,02-09-2021,85 Michelle,02-09-2021,120 Colin,02-10-2021,90 Sean,02-10-2021,150 Colin,02-11-2021,30
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
