Question: Perform the following steps in a single code cell. 1 . Create a function named process _ row ( ) . This function should accept

Perform the following steps in a single code cell.
1. Create a function named process_row(). This function should accept a single parameter named row, which is intended to represent individual elements of the nasa_raw RDD. The function should perform the follow processing tasks on the string contained in row, in the order described.
a. Use the string replace() method to replace double quotes with empty strings.
b. Tokenize the string on space characters using the split() method.
c. If the last token (indicating bytes) is equal to a hyphen, replace it with 0.
d. Coerce the bytes token to an integer. (Note that the status code could be interpreted as an integer, but we will leave it as a string to more easily reflect that this is categorical information).
e. Return the resulting list of tokens.
2. Apply the process_row() function to the elements of nasa_raw to create a new RDD named nasa. The new RDD should contain the same number of elements as nasa_raw, but these elements should be lists instead of strings.
3. Persist the nasa RDD to memory.
4. Print the first 10 elements of the nasa RDD, with each element appearing on a different line of
output.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!