Question: need only conclusions values, and as for the second, we used an operator Replace Missing Values which had replaced the missing instances with average value.

need only conclusions
values, and as for the second, we used an operator Replace Missing Values which had replaced the missing instances with average value. After the completion of these steps, we got the final data set that has 120269 instances with no missing values From the original attributes showed in Table 1 we derived several new attributes in order to inspect if some aggregated attributes can provide more information to the algorithms, compared to original ones. These are: Table 2. Derived attributes Variable name Description Debr MonthlyIncome "DebiRatio CumulativeLateness Sum of all lateness WagesPerCapita MonthlyIncome/NumberoDependents Age Discreted Age is discretized in three categories: Young (under 35). Middle age (between 35 and 60). Old (above 60 years of age) 22 Algorithms Ensemble Random Because of the imbalanced nature of dataset we evaluated several ensemble algorithms, since they often showed good performance on this kind of data. Additionally, we compared algorithm performances with single algorithms Brief descriptions of algorithms used in this study is given in Table 3. Table 3: Types of used algorithms Type Algorithm Description Learns a set of random trees, Le for each split only a random subset of Forest attributes is available. The resulting model is a voting model of all trees. This operator trains an ensemble of assifiers for boolean target attributes. In Boosting each iteration the training set is roweighted, so that previously discovered patterns and other kinds of prior knowledge are "sampled out Ensemble Meta Cost Operator uses a given cost max to compute label predictions according to classification costs Ensemble Bagging Bagoing operator allowing a learners Single Naive Bayes Returns classification model using estimated normal distributions. Single Decision Tree Generales decision trees to classity nominal data Single Perceptron Single Perceptron finding separating hyperplane it one exists. Ensemble Bayesian 23. Evaluation Because of high class imbalance and binary class type, classification accuracy is not suitable for measuring performance of algorithms, since it gives misleading results, on this type of data. In our case, classification accuracy would give the accuracy of 93.05% forth majority algorithm, Leif algorithm predict always negative case (person will return credit). Even though this accuracy is high, it doesn't reflect the success of the model since in this case is more important to predict positive cases (identify persons that will not retum the credit) This is the reason we used Area under curve (AUC measure) for algorithm evaluation. It represents the area under Receiver Operating Characteristic (ROC) curve-graphical plot which mustrates the performance of a binary classifier system as its discrimination threshold is varied. ROC curve is created by plotting the fraction of true positives out of the total actual positives (TPR = true positive rate) divided by the fraction of false positives out of the total actual negatives (FPR = false positive rate). at various threshold settings. The true positive rate, or recall is calculated as the number of true positives divided by the total number of positives The false positive rate is calculated as the number of faise positives divided by the total number of negatives 24. Experiments and results In the first process, we evaluated performances of all algorithms on all available attributes. AUC is measured based on 10 fold cross validation with the stratified samping. It builds random subsets and ensures that the class distribution in the subsets is the same as in the whole dataset. In our case of a binominal classification stratified sampling builds random subsets in which each subset contains roughly the same proportions of the two values of class labels. Stratification is the process of dividing members of the population into homogeneous subgroups before sampling. Afterwards, simple random sampling or systematic sampling is applied within each stratum. This often improves the representativeness of the sample by reducing sampling error. This enables correct interpretation of AUC values Imbalanced class distributions can lead the algorithms to learn overty complex models that overft the data and have little relevance. It is known that class imbalance problem can be successfully solved with under-sampling techniques. In this experiment we evaluated every algorithm on different sub-samples Table 4: Evaluation of algorithms on basic data set Algorithm Sample (class class 1) 111911/8357 50000 8357 30000 8357 8357/8357 5000 8357 2000/8357 0.792 Random Forest 0.5 0.500 0.575 (0.667) 0.642 0.0 Bayesian Boosting 0.5 Decision stump 0.570 0.570 0.771 0.715 0.5 Meta Cost- 0.511 Decision stump 0.571 0.648 0.688 0.775 0.697 Bapping 0.615 0505 0 570 an DU CHI Decision stump 0.739 0.626 0.748) Naive Bayes 0.801 0.797 0.797 0.795 0.797 0.789 Decision Tree 0.719 0.500 0.500 0.773 0.715 0.701 Pepe 0.812 0.813 0.814 0.812 0.737 Q.775 195 After the conducted experiment with basic attributes, we started another analysis, but this time we utilized derived attributes, hoping to get better results. These results are presented in Table 5: Table 5: Evaluation of algorithms on data set that contained derived attributes Algorithm Sample 111911/8357 500003357 30000 8357 835718357 (class class 1) 50008357 2000/8357 Random Forest 0.803 0.5 0.5 0.593 0794 0.5 (0.737, 810) Bayesian Boosting 0.502 0.631 0.664 0.789(0.708) 0.737 0.5 Meta Cost 0.54 0 587 0.657 0.758 0.757 0.776 Bagging 0.513 0.579 0.625 0.692 0.743 0.579 Naive Bayes 0.70 0.789 0.782 0.782 (0.68) 0.784 0.789 Decision Tree 0.5 0.5 0546 0.753 0.753 0.753 Perceptron 0.809 0.809 0.807 0.809 0.811 0.81 (0.756, 803) It can be seen from Table 4 that different samples influence on algorithm performance and that Perceptron in all cases gives the best results (around 0.81). Perceptron is a type of artificial neural network. It can be seen as the simplest kind of feed-forward neural network a near classifier. From this we can conclude that Perceptron successfully solves imbalance data problem on all sub-samples. This indicates that perceptron could be trained on smaller sub-samples in order to reduce time cost without worsening the performance It can be also be noticed from Table that ensemble sigorthms showed the best performance when data is completely balanced (8.357 negative and 8 357 positive cases). As expected. Decision Tree algorithm showed the worst results especially when the data is imbalanced (511.911 positive and 8.357 negative cases) In the second experiment we used feature weighting techniques (Evolutionary and PSO) in order to identityStep by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
