Question: R language In this assignment, download the train.csv from https://www.kaggle.com/code/gadigevishalsai/credit-score-classification-eda-classification/data (you do NOT need the test.csv file). 1Coding: (a) Read the .csv file into R.

R language

In this assignment, download the train.csv from

https://www.kaggle.com/code/gadigevishalsai/credit-score-classification-eda-classification/data (you do NOT need the test.csv file).

1Coding:

(a) Read the .csv file into R.

(b)Use the str() function to obtain variable types.

(c)Create a new object which should be a subset of this dataset. The subset only includes sample units with the Credit_Score variable equal to Poor or Good.

(d)Create a new object which includes 80% of the sample units from 5.1c, this 80% should be randomly sampled.

(e)For a non-numerical variable that you identified in 5.1b, count the number of unique values in this variable.

Answer the following questions:

(a)If you are to use all variables in this dataset to explain customers credit score (the Credit_Score variable), what are the variables that should NOT be included in analysis, why?

(b)Based on the output from 5.1b, which variable(s) have types that went against your expectation (e.g., you thought a variable is categorical, but based on str(), R had it as numerical)?

(c)From a modeling perspective, what is the potential consequence of treating a numerical variable as a categorical variable? What about treating a categorical variable as a numerical variable?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!