Question: USING R --- output: html_document: default pdf_document: default --- --- title: 'Home Equity Loan Customer Profiling' subtitle: ' BUA684 Module 1' author: - FirstName LastName

USING R

--- output: html_document: default pdf_document: default ---

--- title: 'Home Equity Loan Customer Profiling' subtitle: ' BUA684 Module 1' author: - FirstName LastName date: "`r format(Sys.time(), '%d %B %Y')`" output: pdf_document ---

### Problem The aim of this assignment is to, through a K-means clustering analysis, create portraits of a large group of home equity loan customers in one bank by divide the customers into different segments based on their financial and career data. Such customer profiling is expected to provide senior management of the bank with a better understanding of different distinct features of different customer segments.

*Note: you can always ask your chatGPT programming assistant to explain each of the following R codes.*

### Data ```{r message=FALSE, warning=FALSE} # import data library(readr) hmeq_profile <- read_csv("hmeq_profile.csv") head(hmeq_profile) ncol(hmeq_profile) nrow(hmeq_profile) ```

*Problem 1: How many attributes and data records in the raw dataset `hmeq_profile`?*

**Your answer:( )**

```{r message=FALSE, warning=FALSE} # wrangle data # standardize selected attributes in this dataset library(dplyr) hmeq_profile.std<-hmeq_profile %>% mutate_at(vars(-REASON, -JOB, -DEROG, -DELINQ, -NINQ), scale) head(hmeq_profile.std)

# create the scatterplot matrix showing the relationships within different pairs of two # attributes. pairs(select(hmeq_profile.std, LOAN, MORTDUE, VALUE, YOJ, CLAGE, CLNO, DEBTINC), main = "Home Equity Loan Customer") # remove those records with DEBTINC>10 or CLAGE>10 hmeq_profile.std.filtered<-hmeq_profile.std %>% filter(DEBTINC<10 & CLAGE<10) #check whether the records are removed. max(hmeq_profile.std.filtered$DEBTINC) max(hmeq_profile.std.filtered$CLAGE) ```

*Problem 2: Why do we need to standardize those selected attributes before attempting the K-means clustering?*

**Your answer:( )**

*Problem 3: based on the scatterplot matrix, explain why we want to remove those specific data records before attempting the K-means clustering?*

**Your answer:( )**

### Analysis ```{r message=FALSE, warning=FALSE} # select optimal number of clusters library(factoextra) set.seed(2020) fviz_nbclust(select(hmeq_profile.std.filtered, LOAN, MORTDUE, VALUE, YOJ, CLAGE, CLNO, DEBTINC), kmeans, method = "wss") ```

*Problem 4: based on the above R output, what is the optimal number of clusters you choose to produce in the following clustering analysis? And why the optimal number makes sense to you?*

**Your answer:( )**

*Problem 5: Complete the following R commands to conduct a K-means clustering analysis.* ```{r message=FALSE, warning=FALSE, error=FALSE} # complete the following R commands for K-means clustering hmeq_profile.std.filtered.selected<-hmeq_profile.std.filtered%>% select(LOAN, MORTDUE, VALUE, YOJ, CLAGE, CLNO, DEBTINC) set.seed(2020) hmeq_profile_kmeans <- kmeans(hmeq_profile.std.filtered.selected, centers =____) hmeq_profile_kmeans$centers hmeq_profile_kmeans$size ```

*Problem 6: why don't we use all the attributes in the dataset to run the K-means analysis?*

**Your answer:( )**

*Problem 7: Based on the above R output, describe interesting features of some clusters produced from the analysis?*

**Your answer:( )**

```{r message=FALSE, warning=FALSE, error=FALSE} # read all cluster assignments into a vector called ClusterID ClusterID<-hmeq_profile_kmeans$cluster # merge data records in hmeq_profile.std.filtered with their ClusterID hmeq_profile.std.filtered.K<-cbind(hmeq_profile.std.filtered, ClusterID) head(hmeq_profile.std.filtered.K)

# the "janitor" package has some nice functions to creating table summary that # play nicely with the %>% pipe in dplyr library(janitor) hmeq_profile.std.filtered.K%>% tabyl(ClusterID, REASON, JOB) ```

*Problem 8: Based on the above R output, describe interesting features of some clusters in terms of REASON and JOB?*

**Your answer:( )**

### Discussion *Consider the potential benefits that K-means analysis results can provide for senior management in a bank when making well-informed decisions related to credit risk management. Although this discussion won't be elaborated here, you will have the opportunity to collaborate with your project team to explore this topic further and collectively develop effective strategies*

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related General Management Questions!

--- output: html_document: default pdf_document: default --- --- title: 'Twitter Retweetability Analysis' subtitle: 'UMaine BUA684 Module 3' author: - FirstName LastName date: "`r format(Sys.time(),...

*It says my question has been answered but I don't see the answer *THE ONES IN BOLD ARE THE QUESTIONS YOU NEED TO ANSER --- output: html_document: default pdf_document: default --- --- title:...

--- output: html_document: default pdf_document: default --- --- title: 'Twitter Text Mining for LLBean Social Media Content Marketing' subtitle: author: date: "`r format(Sys.time(), '%d %B %Y')`"...

**PLEASE PROVIDE ANSWERS TO 8,9,10,11 ** --- output: pdf_document: default html_document: default --- --- title: 'Home Equity Loan Customer Pre-screen and Scoring' subtitle: 'UMaine BUA684 Module 3'...

This is the previous question I submitted. I need Answer to Problem 8 , 9 , 10 and 11 now please *Problem 8: In the following chunk, estimate the possible range of regression coefficient estimate for...

--- output: pdf_document: default html_document: default --- --- title: 'Universal Bank Personal Loan Acceptance' subtitle: ' BUA Assignment' author: - FirstName LastName date: "`r format(Sys.time(),...

--- output: pdf_document: default html_document: default --- --- title: 'Twitter Hashtags Basket Analysis' subtitle: 'BUA684 Module 1' author: - FirstName LastName date: "`r format(Sys.time(), '%d %B...

rt Consider the daily stock prices of Company A. Let x, = log(P), where P, is the opening price of the stock at time 1. Let r, be the log return series of the stock prices, where r; = x; x4. The...

a. Do the data provide sufficient evidence to indicate a difference in bonus between female and male? Write the hypotheses, compute the observed test statistic, compute the P-value (or boundaries of...

Method 2 : Use R function ` predict ( ) ` to predict. ` ` ` { r } # code to predict distance for a speed at 1 9 mph . predict ( model , newdata = data.frame ( speed = 1 9 ) ) #TYPE 2 5 AFTER THE LAST...

What is the moral of this story?

Consider the transportation problem formulation and solution of the Metro Water District problem presented in Secs. 9.1 and 9.2 (see Tables 9.12 and 9.23). The numbers given in the parameter table...

Which of the following is a temporary dfference clansiled as a revenue or gain that is tacable after it is recognized in financial income? interent recelved on a municipal obligation sulworiptions...

Premise: Using a dictionary and list, create a set of key / value pairs that consists of a musical database. The key / value pairs should consist of Artist Name: Song Names pattern. For example, EACH...