Question: Classifying Internet Discussion Posts. In this problem, you will use the data and scenario described in this chapters example, in which the task is to
Classifying Internet Discussion Posts. In this problem, you will use the data and scenario described in this chapter’s example, in which the task is to develop a model to classify documents as either auto-related or electronics-related.
a. Load the zipped file into Python and create a label vector.
b. Following the example in this chapter, preprocess the documents. Explain what would be different if you did not perform the “stemming” step.
c. Use the LSA to create 10 concepts. Explain what is different about the concept matrix, as opposed to the TF-IDF matrix.
d. Using this matrix, fit a predictive model (different from the model presented in the chapter illustration) to classify documents as autos or electronics. Compare its performance to that of the model presented in the chapter illustration.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
