Question: 20.1 Tokenization. Consider the following text version of a post to an online learning forum in a statistics course: Thanks John! Illustrations and demos will

"Illustrations and demos will be provided for students to work through on their own". Do we need that to finish project? If yes, where to find the illustration and demos? Thanks for your help. \
a. Identify 10 non-word tokens in the passage. b. Suppose that this passage constitutes a document to be classified, but you are not certain of the business goal of the classification task. Identify material (at least 20% of the terms) that, in your judgment, could be discarded fairly safely without knowing that goal. c. Suppose that the classification task is to predict whether this post requires the attention of the instructor, or whether a teaching assistant might suffice. Identify the 20% of the terms that you think might be most helpful in that task. d. What aspect of the passage is most problematic from the standpoint of simply using a bag-of-words approach, as opposed to an approach in which meaning is extracted
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
