Question: Classifying Internet Discussion Posts. In this problem, you will use the data and scenario described in this chapters example, in which the task is to

Classifying Internet Discussion Posts. In this problem, you will use the data and scenario described in this chapters example, in which the task is to develop a model to classify documents as either auto-related or electronics-related.

a. Load the zipped file into R and create a label vector. Following the example in this chapter, preprocess the documents.

b. Explain what would be different if you did not perform the stemming step.

c.Use the lsa package to create 10 concepts. Explain what is different about the concept matrix, as opposed to the TF-IDF matrix.

d. Using this matrix, fit a predictive model (different from the model presented in the chapter illustration) to classify documents as autos or electronics. Compare its performance to that of the model presented in the chapter illustration.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related General Management Questions!