Question: In this problem, you are going to build predictive models for bank telemarketing problem. The data is related with direct marketing campaigns of a Portuguese
In this problem, you are going to build predictive models for bank telemarketing problem. The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product bank term deposit would be yes or not no subscribed. You are given bank additionalfull.csv file containing the data set.
Here is the information on the data attributes:
Input variables:
# bank client data:
age numeric
job : type of job categorical: 'admin.'bluecollar','entrepreneur','housemaid','management','retired','self employed','services','student','technician','unemployed','unknown'
marital : marital status categorical: 'divorced','married','single','unknown'; note: 'divorced' means divorced or widowed
education categorical: 'basicy'basicy'basicy'high.school','illiterate','professional.course','university.degree','unknown'
default: has credit in default? categorical: no'yes','unknown'
housing: has housing loan? categorical: no'yes','unknown'
loan: has personal loan? categorical: no'yes','unknown'
# related with the last contact of the current campaign:
contact: contact communication type categorical: 'cellular','telephone'
month: last contact month of year categorical: 'jan', 'feb', 'mar', 'nov', 'dec'
dayofweek: last contact day of the week categorical: 'mon','tue','wed','thu','fri'
duration: last contact duration, in seconds numeric Important note: this attribute highly affects the output target eg if duration then yno Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model.
# other attributes:
campaign: number of contacts performed during this campaign and for this client numeric includes last contact
pdays: number of days that passed by after the client was last contacted from a previous campaign numeric; means client was not previously contacted
previous: number of contacts performed before this campaign and for this client numeric
poutcome: outcome of the previous marketing campaign categorical: 'failure','nonexistent','success'
# social and economic context attributes
emp.var.rate: employment variation rate quarterly indicator numeric
cons.price.idx: consumer price index monthly indicator numeric
cons.conf.idx: consumer confidence index monthly indicator numeric
euriborm: euribor month rate daily indicator numeric
nremployed: number of employees quarterly indicator numeric
Output variable desired target:
y has the client subscribed a term deposit? binary: 'yes',no
You are going to build predictive models for the prediction of the ouput whether a given client will subscrive a term deposit or not. You will use data in bankadditionalfull.csv file. Some attributes have unknown or nonexistent categories. Dont bother to clean this data. You can consider them as a category in that attribute. Note that you may not get the exact results with results given in the assignment. Slightly different results are fine.
You are asked to perform the following tasks: Build a Random Forest model.
Follow the instructions given when building the model.
Using gridsearch try to find the best score and combination of the following hyperparameters: Number of estimators:
maxdepth:
minsamplessplit:
minsamplesleaf:
For crossvalidation in grid search use a cross validation strategy as fold crossvalidation with repetitions.
Report the hyperparameter set yielding the best score. For scoring use AUC score. Build a neural network model:
Follow the instructions given when building the model.
First, scale your input data so that it has zero mean and one standart deviation. This is important because neural network models are sensitive to input scaling.
Then using gridsearch try to find the best score and combination of the following hyperparameters: hiddenlayersizes:
alpha:
In grid search fit the use AUC score as scoring. For crossvalidation in grid search use a cross validation strategy as fold crossvalidation with repetitions.
Report the hyperparameter set yielding the best score.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
