Dataset Description You will get two datasets. train.csv is for training your model, and test.csv contains the

Fantastic news! We've Found the answer you've been seeking!

Question:

Dataset Description

You will get two datasets. train.csv is for training your model, and test.csv contains the information to predict. The submission has to be strictly in the format indicated in the sample_submission.csv.

Dataset description

Files

train.csv- the training set
test.csv- the test set

download dataset

https://drive.google.com/drive/folders/105jPIlN8sK-lprLibpC6iEMkfv2K135x?usp=sharing

(Note that the outcome has to be the class probabilities)

Columns

Client information

id- client id (numeric)
age- age of client (numeric)
job- type of job (categorical: "admin.","artisan","entrepreneur", "housemaid", "management", "retired", "self-employed", "services", "student", "technician", "unemployed", "unknown")
civil- marital status of client (categorical: "divorced", "married", "single","unknown"; note: "divorced" means divorced or widowed)
education- education of client (categorical: "4K", "6K", "K9", "K12", "illiterate", "apprenticeship", "university", "unknown")
credit- has credit in default? (categorical: "no","yes","unknown")
hloan- has housing loan? (categorical: "no","yes","unknown")
ploan- has personal loan? (categorical: "no","yes","unknown")

Campaign details

ctype- contact communication type (categorical: "cellular","telephone")
month- last contact month of year (categorical: "jan", "feb", "mar", ..., "nov", "dec")
day- last contact day of the week (categorical: "mon","tue","wed","thu","fri")
ccontact- current number of contacts performed during this campaign and for this client (numeric, includes last contact)
lcdays- number of days that passed by since client was last contacted by a previous campaign (numeric; 999 means client was not previously contacted)
pcontact- number of contacts performed before this campaign and for this client (numeric)
presult- outcome previous marketing campaigns (categorical: "failure","nonexistent","success")

Socioeconomic indicators

employment- employment variation rate - quarterly indicator (numeric)
cprice- consumer price index - monthly indicator (numeric)
cconf- consumer confidence index - monthly indicator (numeric)
euri3- euribor 3 month rate - daily indicator (numeric)
employees- number of employees - quarterly indicator (numeric)

Outcome variable (target)

outcome- has the client opened a saving account? (binary: 1 = "yes", 0 = "no")

Model Evaluation

We will evaluate models using Area under ROC (AUC).

AUC is commonly used to compare model accuracy. The maximum value that can be achieved is 1 (perfect model/classifier). An AUC value of 0.5 means that it performs equally to a random classifier. An AUC below a value of 0.5 means your model performs worse than a random one. You see the grading evaluation on details in the grading tab.

Submission for Models

Submission files must be .csv files. Every customer in the given dataset has a unique customer ID under theIdcolumn, as you can obtain it from the test.csv file.

The file should contain a header and have the following format:

Id, outcome

103024, \hat{y}

whereoutcomeis thepredicted probabilityof being class 1 (opened saving account) andidis the customer ID. You can combine your prediction with the test setidvalues, for example, using the command

submission <- cbind(test$id,my.prediction)

write.csv(submission, file ="submission.csv")

Kaggle will match the performance of eachid. This way, Kaggle can ensure correct error calculation even in case you change the order of the test set. There is a submission example in the data section.

Posted Date: May 19, 2024 12:06 PM

See More Questions

Dataset Description You will get two datasets. train.csv is for training your model, and test.csv contains the

Question:

Expert Answer:

Students also viewed these mathematics questions