Question: PLEASE WRITE IN R CODING LANGUAGE AND R STUDIO. Start after you were to loadein the data. Week 7 : R Programming PCA and TSNE

PLEASE WRITE IN R CODING LANGUAGE AND R STUDIO. Start after you were to loadein the data.
Week 7: R Programming PCA and TSNE -75 points (LO2)(LO3)(LO4)
For this assignment you will write an R program to complete the tasks given below. You will hand in two files for this assignment.
A File with your R program
A PDF/DOC file with your output code.
Use the following file
R Data Set: HMEQ_Scrubbed.csv (in the zip file attached).
The Data Dictionary in the zip file.
Note: The HMEQ_Scrubbed.csv file is a simple scrubbed file from the previous week homework. If you did more advanced scrubbing of data for last week, you may use your own data file instead. You might get better accuracy! If you decide to use your own version of HMEQ_Scrubbed.csv, please hand it in along with the other deliverables.
This assignment is an extension of the Week 6 assignment. The difference is that this assignment will now incorporate PCA and tSNE analysis.
Step 1: Use the Decision Tree / Random Forest / Decision Tree / Regression code from Week 6 as a Starting Point
In this assignment, we will not be doing all the analysis as before. But much of the code from week 6 can be used as a starting point for this assignment. For this assignment, do not be concerned with splitting data into training and test sets. In the real world, you would do that. But for this exercise, it would only be an unnecessary complication.
Step 2: PCA Analysis
Use only the input variables. Do not use either of the target variables.
Use only the continuous variables. Do not use any of the flag variables.
Do a Principal Component Analysis (PCA) on the continuous variables.
Display the Scree Plot of the PCA analysis.
Using the Scree Plot, determine how many Principal Components you wish to use. Note, you must use at least two. You may decide to use more. Justify your decision. Note that there is no wrong answer. You will be graded on your reasoning, not your decision.
Print the weights of the Principal Components. Use the weights to tell a story on what the Principal Components represent.
Perform a scatter plot using the first two Principal Components. Color the scatter plot dots using the Target Flag. One color will represent "defaults" and the other color will represent "non defaults". Comment on whether you consider the first two Principal Components to be predictive. If you believe the graph is too cluttered, you are free to do a random sample of the data to make it more readable. That is up to you.
Step 3: tSNE Analysis
Use only the input variables. Do not use either of the target variables.
Use only the continuous variables. Do not use any of the flag variables.
Do a tSNE analysis on the data. Set the dimensions to 2.
Run two tSNE analysis for Perplexity=30. Color the scatter plot dots using the Target Flag. One color will represent "defaults" and the other color will represent "non defaults". Comment on whether you consider the tSNE values to be predictive.
Repeat the previous step with a Perplexity greater than 30(try to get a value much higher than 30).
Repeat the previous step with a Perplexity less than 30(try to get a value much lower than 30).
Decide on which value of Perplexity best predicts the Target Flag.
Train two Random Forest Models to predict each of the tSNE values.
Step 4: Tree and Regression Analysis on the Original Data
Create a Decision Tree to predict Loan Default (Target Flag=1). Comment on the variables that were included in the model.
Create a Logistic Regression model to predict Loan Default (Target Flag=1). Use either Forward, Backward, or Stepwise variable selection. Comment on the variables that were included in the model.
Create a ROC curve showing the accuracy of the model.
Calculate and display the Area Under the ROC Curve (AUC).
Step 5: Tree and Regression Analysis on the PCA/tSNE Data
Append the Principal Component values from Step 2 to your data set.
Using the Random Forest models from Step 3, append the two tSNE values to the data set.
Remove all of the continuous variables from the data set (set them to NULL). Keep the flag variables in the data set.
Create a Decision Tree to predict Loan Default (Target Flag=1). Comment on the variables that were included in the model. Did any of the Principal Components or tSNE values make it into the model? Discuss why or why not.
Create a Logistic Regression model to predict Loan Default (Target Flag=1). Use either Forward, Backward, or Stepwise variable selection. Comment on the variables that were included in the model. Did any of the Principal Components or tSNE values make it into the model? Discuss why or why not.
Create a ROC curve showing the accuracy of the model.
Calculate and display the Area Under the ROC Curve (AUC).
Step 6: Comment
Discuss how the PCA / tSNE values performed when compared to the original data set.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!