Question: You will be using Churn-Modeling.csv for this one. Here is a description of the variables. Some of the Variables are self-explanatory Variable Description Surname Last

You will be using Churn-Modeling.csv for this one. Here is a description of the variables. Some of the Variables are self-explanatory

Variable

Description

Surname

Last Name of the customer

Tenure

Number of Years the customer has been with the bank

Balance

Account Balance in USD

NumOfProducts

Number of Banks products to which the customer subscribed

HasCrCard

1 if the customer has a credit card with the bank

0 If the customer doesnt have a card with the bank

IsActiveMember

Whether customer made a transaction with the bank in the last 1 year

EstimatedSalary

The annual salary of the customer in USD

Exited

1 If the customer decided to quit the bank

0 If the customer decided to stay with the bank

Load and attach the Churn-Modeling.csv dataset into a variable mydata. Conduct an exploration of what variables it contains and what the values look like. How many rows does the dataset have? Convert the variables HasCrCard, IsActiveMember ,Gender and Exited into factors. What is the Surname of the customer with the highest EstimatedSalary? (1 Point)

Use CreditScore, Age and Balance to perform K-means clustering on the trees. You need to use the preProcess() and predict() functions in caret library to normalize the data in the range 0 to 1. Use K=3, 4 and 5. Which one is better (Use the ratio of inter cluster to intra cluster distance for this one)? (2 Points)

Plot the clusters using these variables for K=3, 4 and 5 i.e., three separate plots (Using CreditScore and Balance on X and Y axes and coloring them by the cluster number to which they belong). Which value of K do you think is better now? Use the Silhouette Method for this one (1.5 Points)

Use the table() command to tabulate the relationship between cluster assignment and the variable Exited for the best choice of K from the previous question. Do you see any clusters that contain predominantly Exited clusters. If so, describe the characteristics of customers in that cluster using the values of the variables used in clustering (Hint : Compare the centroid values of this cluster to the other clusters). (1.5 Points)

Use the variables Tenure, NumOfProducts and EstimatedSalary this time to run a k-means clustering algorithm (Use the scale() function instead of preprocess() and predict() this time). Run it for K=3,4, 5 and 6. Use the elbow method to determine the appropriate number of clusters this time (3 Points)

Use the table() command to tabulate the relationship between cluster assignment and the variable Exited for the best choice of K from the previous question. Do you see any clusters that contain predominantly Exited clusters. If so, describe the characteristics of customers in that cluster using the values of the variables used in clustering (Hint : Compare the centroid values of this cluster to the other clusters). (1 Point)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!