Question: Text Data Preprocessing Here is a legal case we collected from westlaw, please follow the steps we mentioned in lesson 5 to clean the data:

Text Data Preprocessing

Here is a legal case we collected from westlaw, please follow the steps we mentioned in lesson 5 to clean the data:

1.1 Basic feature extraction using text data

  • Number of sentences
  • Number of words
  • Number of characters
  • Average word length
  • Number of stopwords
  • Number of special characters
  • Number of numerics
  • Number of uppercase words

1.2 Basic Text Pre-processing of text data

  • Lower casing
  • Punctuation removal
  • Stopwords removal
  • Frequent words removal
  • Rare words removal
  • Spelling correction
  • Tokenization
  • Stemming
  • Lemmatization

1.3 Save all the clean sentences to a csv file (one column, each raw is a sentence) after finishing all the steps above. (4 points)

1.4 Advance Text Processing

  • Calculate the term frequency of all the terms.
  • Print out top 10 1-gram, top 10 2-grams, and top 10 3-grams terms as features.

[ ]

# Write your code here

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!