Question: Task 4 : Trim whitespace All our replacements have left some blank titles or mangled titles. First, identify any titles that begin or end with
Task : Trim whitespace
All our replacements have left some blank titles or mangled titles. First, identify any titles that begin or end with an apostrophes Replace those apostrophes with nothing. Be careful not to disturb the apostrophes in the interior of the sentence. Next, trim leading and trailing whitespace from each title. Be careful not to replace all whitespace as we need whitespace between word boundaries.
To match the tests and simplify debugging, do not combine the above two operations into one, but rather handle apostrophes first and then handle whitespace as a second line of code.
In the code, find the incomplete function cleanand complete the instructions to achieve the above goal.
Task : Filter out nonEnglish characters
We want to ignore all song titles that contain a nonEnglish character eg etc.Although this may eliminate some titles in languages like Spanish or Danish, we need to drop out the Unicode garbage characters found in many of the titles.
If the titles contain only valid English alphanumeric characters a to z and to or the apostrophe or a space, we keep it Otherwise, we discard the song title entirely. Use the Raku command next; to skip to the next iteration of the loop, which bypasses pushing it to the array to return.Make sure to accept both uppercase and lowercase letters.
In the code, continuing editing the function clean to achieve the above goal.
Task : Skip blank titles
After all the replacements, some titles are left with nothing. Continue to edit the function clean to throw out any empty titles. Use a regular expression to check if a title is empty or contains only whitespace. If empty, do not retain it in the array you return. Again use the Raku command next; to skip this title. Likewise, if the title contains only an apostrophe, discard it
To match the tests, do not combine the above two operations into one, but handle whitespace first and then handle apostrophes as a second line of code.
In the code, continuing editing the function clean to achieve the above goal.
Task : Set to lowercase
Convert all words in the sentence to lowercase. Raku has a simple function to do this for you. Go back and edit the clean function one more time to convert all titles to lower case.
On completion of Tasks you can check yourself with test t
Task : Filtering out common words
In the domain of Natural Language Processing NLP stop wordsLinks to an external site. are common words that are often filtered out in preprocessing. Go edit the function stopwords to filter out the following common stop words from the song title.
a an and by for from in of on or out the to with
Be careful to only replace entire words, not just any occurrence of these letters. Otherwise you will turn words like outstanding to stg
The regex control character w will be quite useful here, which defines a word boundaryLinks to an external site.. Additionally, remove a single whitespace following the second word boundary. This prevents your titles being littered with double spaces throughout because we also remove the single space that follows the word. This has the unintended consequence that it will retain the stop word if it's the final word of the title because there is no space to match after but for simplicity, let's not worry about that.
We may not want to always filter these words, but we want to have the option. You can now use the command filter stopwords. You can toggle this mode using commands stopwords on and stopwords off.
On completion of Task you can check yourself with test t
Postlude: Putting it all together
You can now use the command preprocesswhich runs all the individual filter tasks.
On completion of Tasks you can check yourself with test tstopwords off and test tstopwords on
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
