Question: Task 4 : Trim whitespace All our replacements have left some blank titles or mangled titles. First, identify any titles that begin or end with

Task 4: Trim whitespace
All our replacements have left some blank titles or mangled titles. First, identify any titles that begin or end with an apostrophe(s). Replace those apostrophes with nothing. Be careful not to disturb the apostrophes in the interior of the sentence. Next, trim leading and trailing whitespace from each title. Be careful not to replace all whitespace as we need whitespace between word boundaries.
To match the tests and simplify debugging, do not combine the above two operations into one, but rather (1) handle apostrophes first and then (2) handle whitespace as a second line of code.
In the code, find the incomplete function cleanand complete the instructions to achieve the above goal.
Task 5: Filter out non-English characters
We want to ignore all song titles that contain a non-English character (e.g.,,,,, etc.).Although this may eliminate some titles in languages like Spanish or Danish, we need to drop out the Unicode garbage characters found in many of the titles.
If the titles contain only valid English alphanumeric characters ('a' to 'z' and 0 to 9) or the apostrophe or a space, we keep it. Otherwise, we discard the song title entirely. Use the Raku command next; to skip to the next iteration of the loop, which bypasses pushing it to the array to return.Make sure to accept both uppercase and lowercase letters.
In the code, continuing editing the function clean to achieve the above goal.
Task 6: Skip blank titles
After all the replacements, some titles are left with nothing. Continue to edit the function clean to throw out any empty titles. Use a regular expression to check if a title is empty or contains only whitespace. If empty, do not retain it in the array you return. Again use the Raku command next; to skip this title. Likewise, if the title contains only an apostrophe, discard it.
To match the tests, do not combine the above two operations into one, but (1) handle whitespace first and then (2) handle apostrophes as a second line of code.
In the code, continuing editing the function clean to achieve the above goal.
Task 7: Set to lowercase
Convert all words in the sentence to lowercase. Raku has a simple function to do this for you. Go back and edit the clean function one more time to convert all titles to lower case.
On completion of Tasks 4-7, you can check yourself with test t06.
Task 8: Filtering out common words
In the domain of Natural Language Processing (NLP), stop wordsLinks to an external site. are common words that are often filtered out in preprocessing. Go edit the function stopwords to filter out the following common stop words from the song title.
a an and by for from in of on or out the to with
Be careful to only replace entire words, not just any occurrence of these letters. Otherwise you will turn words like outstanding to stg.
The regex control character <|w> will be quite useful here, which defines a word boundaryLinks to an external site.. Additionally, remove a single whitespace following the second word boundary. This prevents your titles being littered with double spaces throughout because we also remove the single space that follows the word. This has the unintended consequence that it will retain the stop word if it's the final word of the title (because there is no space to match after), but for simplicity, let's not worry about that.
We may not want to always filter these words, but we want to have the option. You can now use the command filter stopwords. You can toggle this mode using commands stopwords on and stopwords off.
On completion of Task 8, you can check yourself with test t07.
Postlude: Putting it all together
You can now use the command preprocesswhich runs all the individual filter tasks.
On completion of Tasks 1-8, you can check yourself with test t08(stopwords off) and test t09(stopwords on).

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!