Question: need only conclusions industry, advances in computer technology, and the exponential growth of large databases In this report we analysed the problem of credit scoring.

need only conclusions industry, advances in

need only conclusions

industry, advances in computer technology, and the exponential growth of large databases In this report we analysed the problem of credit scoring. Credit scoring algorithms, which make a guess at the probability of default, are the method which banks use to determine whether or not a loan should be granted and they are generally based on statistical pattern-recognition techniques. As we did a research regarding models that were used for the competition on Kaggle's "Give me some credit data set, we noticed that for the problem of classification other competitors used Blended model decision three different usage of attributes. Also, it is important to mention that they used different programs such as R. Viscovery, SAS, SOL etc. We created a model using the RapidMiner program, and its operators: W-Logistics SVM, AdaBoost Decision Tree, Naive Bayes and Random Forest, but the best results were achieved with Random Forest operator invented by Breiman and Cutler 2. CASE STUDY We obtained the data for this problem from the Kaggle challenge website (http://www.kaggle.com/c/GiveMeSomeCredit/). The website provides two data files - one for training and one for testing. The training data file consists of 150,000 cases. The test file contains 101,503 cases. The intention is that we test our classifier on the test data and submit our predictions via Kaggle's online submission process Each sample contains 11 attributes and one class attribute(Table 1). The data set has missing values and is class imbalanced (there are 139,974 cases with "0" as a class value and 10,026 cases with "1' as a class value) ) Table 1: Data set Variable Name SeriousDlqin2yrs Description Type Person experienced 90 days past due Y/N delinquency or worse Revolving UtilizationOfUnsecured Lines Total balance on credit cards and personal lines of percentage credit except real estate and no installment debt like car loans divided by the sum of credit limits age Age of borrower in years Integer NumberOfTime 30 Number of times borrower has been 30-59 days past integer 59Days PastDueNotWorse due but no worse in the last 2 years DebtRatio Monthly debt payments, alimony, living costs divided percentage by monthly gross income MonthlyIncome Monthly income real NumberOfOpenCreditLinesAnd Loans Number of Open loans (installment like car loan or integer mortgago) and Lines of credit (0.9. credit cards) NumberOfTimes90Days Late Number of times borrower has been 90 days or integer more past due Number RealEstate Loans OrLines Number of mortgage and real estate loans including integer home equity lines of credit NumberOfTime60- Number of times borrower has been 60-89 days past integer 89DaysPastDueNot Worse due but no worse in the last 2 years NumberOfDependents Number of dependents in family excluding integer themselves (spouse, children etc.) 25.1A PREPARATION 4 OWA PREPARATION Reviewing the data set, we found that there are 20.730 cases with missing values for certain attributes Testing the performance of the model, we found out that the elimination of cases with missing values achieved better performance compared to replacing the missing values with the average. CON . Figure 1. Data Preparation Afterwards, examining the number of cases with "1" as a class value compared to the total number, we came to the conclusion that the data set is imbalanced. To solve this problem, we used clustering. We applied K means algorithm on the cases with "O as a class value to reduce their number to 6580.We divided the data set into two setsie, one set with the "o" as a class value and the other with "1' as a class value. In order to get better performance of our model, we have reduced the data on 10%, because the algorithm for clustering indicated a problem in creating 6580 clusters. Then, we extracted centroids into the new data set and merged it with the set containing the cases with "1" as a class value using the operator Append in RapidMiner

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related General Management Questions!

A Journal Article Review for " The interaction between technology, business environment, society, and regulation in ICT industries". 1. Write the Title that reflects the main focus of your work. ......

Journal Article Review 1. Write Title that reflects the main focus 2. Cite the article 3. Article Identification 4. Introduction 5. Summarize the Article 6. Critique 7. Conclusion The interaction...

Case 9-2 Continental A.G. Write a report of approximately 750 words that addresses the following points: Examine Continental?s financial statements for unusual accounting practices that may have a...

I need a 10 page paper for my MIS class. Please do not copy and paste as my school is getting stricter on plagiarism. I have attached the assignment and the sample \fData Analytic Thinking 1 Data...

I would like assistance with assignment 3 and 4 on the attached document I have been struggling with the subject and its my last AUI4863/102/0/2016 Tutorial letter 102/0/2016 ADVANCED INTERNAL AUDIT...

Week 3: No Plagiarism No content from other students papers. Post should be in APA 6th edition format, I will need References and in-text citations. This website should be useful for all APA...

Address the interaction of laws and new technologies and how they have evolved each other in recent years based on the below article... BIG DATA "Privacy Versus Progress" : A Necessary Sacrifice, By...

Need help with these multiple choice questions for internal audit final exam. Due in 2 hours. QUESTION 1 RsQ_004Without management direction and assumption of responsibility, it would be...

Big Data for Social Innovation By Kevin C. Desouza & Kendra L. Smith Stanford Social Innovation Review Summer 2014 Copyright 2014 by Leland Stanford Jr. University All Rights Reserved Stanford Social...

Put together 10-20 slide presentation with recorded audio that demonstrates how you would present and deliver your capstone project to your intended audience. Your presentation should be persuasive,...

The payroll procedures used by three different companies are: 1. In Brewer Cafe, each employee is required to mark on a clock card the hours worked. At the end of each pay period, the employee must...

1. The velocity is a relative concept. We may be stationary with respect to plat- form's frame but we are moving with respect to a train's frame, leaving the platform. This question is about the...

The formula to compute the budgeted direct inbor cost for a service flrm is:

When older adults have difficulty with transportation and communication, or may be reluctant to request needed assistance to which they are entitled, they may be in need of: Group of answer choices...