Question: USING R --- output: html_document: default pdf_document: default --- --- title: 'Home Equity Loan Customer Profiling' subtitle: ' BUA684 Module 1' author: - FirstName LastName

USING R

--- output: html_document: default pdf_document: default ---

--- title: 'Home Equity Loan Customer Profiling' subtitle: ' BUA684 Module 1' author: - FirstName LastName date: "`r format(Sys.time(), '%d %B %Y')`" output: pdf_document ---

### Problem The aim of this assignment is to, through a K-means clustering analysis, create portraits of a large group of home equity loan customers in one bank by divide the customers into different segments based on their financial and career data. Such customer profiling is expected to provide senior management of the bank with a better understanding of different distinct features of different customer segments.

*Note: you can always ask your chatGPT programming assistant to explain each of the following R codes.*

### Data ```{r message=FALSE, warning=FALSE} # import data library(readr) hmeq_profile <- read_csv("hmeq_profile.csv") head(hmeq_profile) ncol(hmeq_profile) nrow(hmeq_profile) ```

*Problem 1: How many attributes and data records in the raw dataset `hmeq_profile`?*

**Your answer:( )**

```{r message=FALSE, warning=FALSE} # wrangle data # standardize selected attributes in this dataset library(dplyr) hmeq_profile.std<-hmeq_profile %>% mutate_at(vars(-REASON, -JOB, -DEROG, -DELINQ, -NINQ), scale) head(hmeq_profile.std)

# create the scatterplot matrix showing the relationships within different pairs of two # attributes. pairs(select(hmeq_profile.std, LOAN, MORTDUE, VALUE, YOJ, CLAGE, CLNO, DEBTINC), main = "Home Equity Loan Customer") # remove those records with DEBTINC>10 or CLAGE>10 hmeq_profile.std.filtered<-hmeq_profile.std %>% filter(DEBTINC<10 & CLAGE<10) #check whether the records are removed. max(hmeq_profile.std.filtered$DEBTINC) max(hmeq_profile.std.filtered$CLAGE) ```

*Problem 2: Why do we need to standardize those selected attributes before attempting the K-means clustering?*

**Your answer:( )**

*Problem 3: based on the scatterplot matrix, explain why we want to remove those specific data records before attempting the K-means clustering?*

**Your answer:( )**

### Analysis ```{r message=FALSE, warning=FALSE} # select optimal number of clusters library(factoextra) set.seed(2020) fviz_nbclust(select(hmeq_profile.std.filtered, LOAN, MORTDUE, VALUE, YOJ, CLAGE, CLNO, DEBTINC), kmeans, method = "wss") ```

*Problem 4: based on the above R output, what is the optimal number of clusters you choose to produce in the following clustering analysis? And why the optimal number makes sense to you?*

**Your answer:( )**

*Problem 5: Complete the following R commands to conduct a K-means clustering analysis.* ```{r message=FALSE, warning=FALSE, error=FALSE} # complete the following R commands for K-means clustering hmeq_profile.std.filtered.selected<-hmeq_profile.std.filtered%>% select(LOAN, MORTDUE, VALUE, YOJ, CLAGE, CLNO, DEBTINC) set.seed(2020) hmeq_profile_kmeans <- kmeans(hmeq_profile.std.filtered.selected, centers =____) hmeq_profile_kmeans$centers hmeq_profile_kmeans$size ```

*Problem 6: why don't we use all the attributes in the dataset to run the K-means analysis?*

**Your answer:( )**

*Problem 7: Based on the above R output, describe interesting features of some clusters produced from the analysis?*

**Your answer:( )**

```{r message=FALSE, warning=FALSE, error=FALSE} # read all cluster assignments into a vector called ClusterID ClusterID<-hmeq_profile_kmeans$cluster # merge data records in hmeq_profile.std.filtered with their ClusterID hmeq_profile.std.filtered.K<-cbind(hmeq_profile.std.filtered, ClusterID) head(hmeq_profile.std.filtered.K)

# the "janitor" package has some nice functions to creating table summary that # play nicely with the %>% pipe in dplyr library(janitor) hmeq_profile.std.filtered.K%>% tabyl(ClusterID, REASON, JOB) ```

*Problem 8: Based on the above R output, describe interesting features of some clusters in terms of REASON and JOB?*

**Your answer:( )**

### Discussion *Consider the potential benefits that K-means analysis results can provide for senior management in a bank when making well-informed decisions related to credit risk management. Although this discussion won't be elaborated here, you will have the opportunity to collaborate with your project team to explore this topic further and collectively develop effective strategies*

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related General Management Questions!