Question: As a data scientist working in the higher education field, you aim to extract useful information from an article about at - risk calculus students.

As a data scientist working in the higher education field, you aim to extract useful information from an article about at-risk calculus students. To achieve this, you plan to perform the following tasks on the 'AIED2021_at_risk_pred.pdf' file.
Q1. Extract all texts from the given pdf file.
Q2. Use regular expression to locate any years mentioned in the document.
Q3. Use regular expression to identify all words that start with a capital letter, which might be useful to find key concepts.
Q4. Use regular expression to find specific machine learning algorithms (e.g., Logistic Regression, Support Vector Machines, Random Forest) mentioned in the document
Q5. Remove all special characters, punctuation in the document using regular expression.
Q6. Remove the hyperlink URL in the document using regular expression.
Q7. Remove all words containing at most two characters such as "a","an","in","on","if" using regular expression.
Q8. Remove the following four words: "are", "but", "very", "could" using regular expression.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!