Question: Text Data Preprocessing Here is a legal case we collected from westlaw, please follow the steps we mentioned in lesson 5 to clean the data:
Text Data Preprocessing
Here is a legal case we collected from westlaw, please follow the steps we mentioned in lesson 5 to clean the data:
1.1 Basic feature extraction using text data
- Number of sentences
- Number of words
- Number of characters
- Average word length
- Number of stopwords
- Number of special characters
- Number of numerics
- Number of uppercase words
1.2 Basic Text Pre-processing of text data
- Lower casing
- Punctuation removal
- Stopwords removal
- Frequent words removal
- Rare words removal
- Spelling correction
- Tokenization
- Stemming
- Lemmatization
1.3 Save all the clean sentences to a csv file (one column, each raw is a sentence) after finishing all the steps above. (4 points)
1.4 Advance Text Processing
- Calculate the term frequency of all the terms.
- Print out top 10 1-gram, top 10 2-grams, and top 10 3-grams terms as features.
[ ]
# Write your code here
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
