Question: gastext.csv using Rstudio / rscript . A fuel company has 2 5 0 + gas stations in the US . It captures customers' comments via
gastext.csv using Rstudiorscript A fuel company has gas stations in the US It captures customers' comments via phone, which are merged with numeric variables by matching them with the company's royalty card number. All data were provided in the Gas fext numeric data file, Some of the text comments, variable names, and descriptions were disguised to protect the identity of the client company.
The target variable is identified by the column name.
CustID and LoyalStatus are nominal variables, and all other variables are binary.
Comment column contains the text information.
Variable and model naming requirements:
Please include your name initials to the data frame names as well as model names in your coding.
Please instance, in my coding, would name the data frames as train, and diKZ.valid. I would also name the models as treekZ, etc.
Canvas submission. You need to submit two separate documents via the Canvas HW submission link:
Word document: please provide your answers in the Word document, and copypaste your codes at the end of the document.
coding file.
Grading Criteria:
pts for minor errors
pts for major errors
Questions:
Provide the word cloud after all necessary preprocessing. pts
What are the top terms that are most related to "price"? Please specify your similarity measurement method and detailed results. pts
What are the top terms that are most related to "service"? Please specify your similarity measurement method and detailed results. pts
Perform topic modeling with topics
Further remove some common words, such as "shower" & "point"
You might encounter the issue with all zero rows, and you need to remove those all zero rows. Here are some sample codes for your referenceProvide the termbeat plots for four topics. ptsPlease summarize those four topics based on your best effort pts
Please run two decision tree models
Do we need to remove any column from predictive modeling? pts
Model only uses nontext information ie all other columns except the Comment column
Please show the tree plot
Model combines both nontext and text information
Text mine the Comment column
Apply SVD to extract text information from the Comment column
Keep the number of SVD as
Combine SVD with all other columns except the Comment column
Please show the tree plot
Please compare the model performance of two models based on the confusion matrix of the validation dataset
Please copy and paste your R codes in your WORD submission. pts
Hints:
Sample code to convert multiple columns into factors: df :
lapply de : factor
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
