Question: need only conclusions industry, advances in computer technology, and the exponential growth of large databases In this report we analysed the problem of credit scoring.

need only conclusions industry, advances in

need only conclusions

industry, advances in computer technology, and the exponential growth of large databases In this report we analysed the problem of credit scoring. Credit scoring algorithms, which make a guess at the probability of default, are the method which banks use to determine whether or not a loan should be granted and they are generally based on statistical pattern-recognition techniques. As we did a research regarding models that were used for the competition on Kaggle's "Give me some credit data set, we noticed that for the problem of classification other competitors used Blended model decision three different usage of attributes. Also, it is important to mention that they used different programs such as R. Viscovery, SAS, SOL etc. We created a model using the RapidMiner program, and its operators: W-Logistics SVM, AdaBoost Decision Tree, Naive Bayes and Random Forest, but the best results were achieved with Random Forest operator invented by Breiman and Cutler 2. CASE STUDY We obtained the data for this problem from the Kaggle challenge website (http://www.kaggle.com/c/GiveMeSomeCredit/). The website provides two data files - one for training and one for testing. The training data file consists of 150,000 cases. The test file contains 101,503 cases. The intention is that we test our classifier on the test data and submit our predictions via Kaggle's online submission process Each sample contains 11 attributes and one class attribute(Table 1). The data set has missing values and is class imbalanced (there are 139,974 cases with "0" as a class value and 10,026 cases with "1' as a class value) ) Table 1: Data set Variable Name SeriousDlqin2yrs Description Type Person experienced 90 days past due Y/N delinquency or worse Revolving UtilizationOfUnsecured Lines Total balance on credit cards and personal lines of percentage credit except real estate and no installment debt like car loans divided by the sum of credit limits age Age of borrower in years Integer NumberOfTime 30 Number of times borrower has been 30-59 days past integer 59Days PastDueNotWorse due but no worse in the last 2 years DebtRatio Monthly debt payments, alimony, living costs divided percentage by monthly gross income MonthlyIncome Monthly income real NumberOfOpenCreditLinesAnd Loans Number of Open loans (installment like car loan or integer mortgago) and Lines of credit (0.9. credit cards) NumberOfTimes90Days Late Number of times borrower has been 90 days or integer more past due Number RealEstate Loans OrLines Number of mortgage and real estate loans including integer home equity lines of credit NumberOfTime60- Number of times borrower has been 60-89 days past integer 89DaysPastDueNot Worse due but no worse in the last 2 years NumberOfDependents Number of dependents in family excluding integer themselves (spouse, children etc.) 25.1A PREPARATION 4 OWA PREPARATION Reviewing the data set, we found that there are 20.730 cases with missing values for certain attributes Testing the performance of the model, we found out that the elimination of cases with missing values achieved better performance compared to replacing the missing values with the average. CON . Figure 1. Data Preparation Afterwards, examining the number of cases with "1" as a class value compared to the total number, we came to the conclusion that the data set is imbalanced. To solve this problem, we used clustering. We applied K means algorithm on the cases with "O as a class value to reduce their number to 6580.We divided the data set into two setsie, one set with the "o" as a class value and the other with "1' as a class value. In order to get better performance of our model, we have reduced the data on 10%, because the algorithm for clustering indicated a problem in creating 6580 clusters. Then, we extracted centroids into the new data set and merged it with the set containing the cases with "1" as a class value using the operator Append in RapidMiner

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related General Management Questions!