Question: As a data scientist working in the higher education field, you aim to extract useful information from an article about at - risk calculus students.
As a data scientist working in the higher education field, you aim to extract useful information from an article about atrisk calculus students. To achieve this, you plan to perform the following tasks on the 'AIEDatriskpred.pdf file.
Q Extract all texts from the given pdf file.
Q Use regular expression to locate any years mentioned in the document.
Q Use regular expression to identify all words that start with a capital letter, which might be useful to find key concepts.
Q Use regular expression to find specific machine learning algorithms eg Logistic Regression, Support Vector Machines, Random Forest mentioned in the document
Q Remove all special characters, punctuation in the document using regular expression.
Q Remove the hyperlink URL in the document using regular expression.
Q Remove all words containing at most two characters such as aaninonif using regular expression.
Q Remove the following four words: "are", "but", "very", "could" using regular expression.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
