Question: 1. Create a new process in RapidMiner. Save your work at this point and remember to do it often throughout the process. Loan Default Prediction
1. Create a new process in RapidMiner. Save your work at this point and remember to do it often throughout the process. Loan Default Prediction Exercise 2 Figure 2: New Process 2. Import the excel Creditset data set to be used for building the model. Perform this task with a Read Excel operator. Type Read Excel into the Operator search box in the top-left corner. Drag the operator into the process area. Figure 3: Read Excel operator 3. In the parameters area, click on Import Configuration Wizard button and navigate to the location of the CreditSet data on your computer. Follow the following steps to load the data correctly. 3.1 Cells selected. Click on Next. Loan Default Prediction Exercise 3 Figure 4.1: Variable selection 3.2 Click on drop down arrow at the top right of default_TenYear and choose change Role Figure 4.2: Change variable parameters 3.3 In the next screen, change role of default_TenYear to label Loan Default Prediction Exercise 4 Figure 4.3: Change variable parameters 3.4 For subsequent steps, exclude default_tenYear, clientid and LTI attributes. Figure 4.4: Change variable parameters Loan Default Prediction Exercise 5 Figure 4.5: Change variable parameters Figure 4.6: Change variable parameters Loan Default Prediction Exercise 6 4. Search and drag the Split Validation operator into the process area. Use a relative split and a split ratio of 0.7 in the parameters areas. Explain the meaning of Split Ratio. Discuss how it impacts data validation in data mining? Connect the operators together as shown in figure 5. Figure 5: Validation operator 5. Double-click to open the Split Validation nested operator. Two sub-processes are presented, a Training process and a Testing process. At this point, you will build your model in the Training process and test it in the Testing process. 6. At this point, you are ready to build your DT model. Search and drag the Decision Tree, Apply Model and Performance operators into the process areas. Connect the operators as shown figure 6. In the DT operator, choose a maximal depth of 10 and gain_ratio criterion. Leave all other DT parameters default. Mention and explain 1 other criterion parameter. Loan Default Prediction Exercise 7 Figure 6: Validation operator sub-processes 7. Click on the Run button. 8. Include a snapshot of the Accuracy criterion (in the Performance tab) in your report. Explain the parameters and how it relates to prediction of individuals who are likely to default on their loan payments. 9. Include a snapshot of the DT graph and Description in your report. Explain the graph and how it relates to prediction of individuals who are likely to default on their loan payments. 10. Assume you would want to prune the Tree to a depth of 5. Edit the necessary parameter(s) in the DT operator and run the model again. 11. Include a snapshot of your new DT graph and explain how it relates to prediction of individuals who are likely to default on their loan payments
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
