Question: 20.1 Tokenization. Consider the following text version of a post to an online learning forum in a statistics course: Thanks John! Illustrations and demos will

20.1 Tokenization. Consider the following text

20.1 Tokenization. Consider the following text version of a post to an online learning forum in a statistics course: Thanks John!
"Illustrations and demos will be provided for students to work through on their own". Do we need that to finish project? If yes, where to find the illustration and demos? Thanks for your help. \
a. Identify 10 non-word tokens in the passage. b. Suppose that this passage constitutes a document to be classified, but you are not certain of the business goal of the classification task. Identify material (at least 20% of the terms) that, in your judgment, could be discarded fairly safely without knowing that goal. c. Suppose that the classification task is to predict whether this post requires the attention of the instructor, or whether a teaching assistant might suffice. Identify the 20% of the terms that you think might be most helpful in that task. d. What aspect of the passage is most problematic from the standpoint of simply using a bag-of-words approach, as opposed to an approach in which meaning is extracted

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related General Management Questions!