Question: Instructions: The file BaseDecisionTree.java provides an implementation for a decision node. Each node specifies an index (featureIndex) in the feature vector and a threshold (threshold)
Instructions: The file BaseDecisionTree.java provides an implementation for a decision node. Each node specifies an index (featureIndex) in the feature vector and a threshold (threshold) to be used in the comparison. The leaves of the tree are dummy nodes without comparison, which return the class with the highest probability for that node. These probabilities are obtained either by specifying them directly or by accumulating the results obtained by using pre-classified samples (classFreq and total).
The BaseDecisionTreeTester.java builds the decision tree is shown in the above figure and tests its results with two sets of titanic data:
A set of simulated data for probabilities are used to: > build the tree, probabilities for each class output (in percentage) for each external(leaf) node are given. > test this tree with 5 samples.
Real data in which you use to: o complete the unknown probabilities in the above tree to build your own tree, the first probability is given to you in the tree (27% for female survivals). > Test your work. A set of tests is given and commented. You can comment them out once you finish your code to test your tree.
To complete the unknow probabilities in the figure and test your tree, you are given the following files in the zip folder:
The source code BaseDecisionTree.java in which you should fill the incomplete methods with your own code.
A file (input.txt) containing the real titanic input to calculate the probabilities for your decision tree.
A file (output.txt), which you should use to calculate the accuracy of your tree.
Please use the BaseDecisionTree.java and 3 text files to complete BaseDecisionTree class by adding the requested operations:
1. To build your tree, you are required to use the input data file input.txt to calculate the probabilities for your decision tree for the real titanic data. The file has four columns: 3 for input features (gender, age and #siblings) while the last column represents the correct answer survived/not survived,
2. To test your tree, calculate the accuracy for your decision tree by using use the files, output.txt (the file has three columns: gender, age and #siblings) that has samples of the titanic data and output_answers.txt (has one column, the output class) which has the correct answers for each sample.
BaseDecisionTree.java
package assignment2;
public class BaseDecisionTree { private int featureIndex; // feature index on which private double threshold; // threshold is applied
public static int dim; // input feature vector dimension public static int nClasses; // number of classes
private int[] classFreq; // number of time instance of this class has reached this node private int total; // number of trained samples having reached this node
private BaseDecisionTree smallerBranch; // if feature value < threshold private BaseDecisionTree greaterBranch; // if feature value >= threshold private BaseDecisionTree parent; // a reference to the parent node
/** * constructor for a leaf */ public BaseDecisionTree() {
this.featureIndex = -1; this.threshold = 0.0;
classFreq = new int[nClasses]; }
/** * constructor for a node */ public BaseDecisionTree(int featureIndex, double threshold) {
this.featureIndex = featureIndex; this.threshold = threshold;
classFreq = new int[nClasses]; }
/** * get the class associated with this decision tree */ public int getDecision(double[] vector) { // write your code here. return 0; } /** * return the class having highest probability */ public int getMaxClass() { // write your code here. return 0; }
/** *returns the number of samples that hit this node */ public int getTotal() { return total; }
/** *return the highest probability */ public double getMaxProb() { // write your code here. return 0; } /** * manually set the probability of class c (in %) * to use instead of updateProbCount */ public void setProb(int c, int percentage) {
classFreq[c] = percentage; total = 100; }
/** In this method you should calculate the probability for different input features *(gender, age and siblings) and use them to build your tree. This function takes an *input file input.txt. The file is provided in the zip folder. *For each feature (column) calculate the probability of survived/non survived(last column). *For example for feature 1 (male/female) we have female with 27% survived, count how many samples of female that has survived/total number of females. */ public int[] CalcProb(String inputFeatureWLabel) { /* * write your code here. */ }
/* * adds a new branch on the left */ public void setSmallerBranch(BaseDecisionTree ds) {
smallerBranch = ds; ds.parent = this; }
/* * adds a new branch on the right */ public void setGreaterBranch(BaseDecisionTree ds) { /* * write your code here. */ }
/* * returns the parent of this node */ public BaseDecisionTree getParent() {
return parent; }
/* * returns the left child of this node */ public BaseDecisionTree getSmallerBranch() {
return smallerBranch; }
/* * returns the right child of this node */ public BaseDecisionTree getGreaterBranch() {
return greaterBranch; }
/* * checks if it is a leaf */ public boolean isExternal() {
return featureIndex == -1; } /** * 1.Calculate the accuracy of your results. * After building your tree and testing with sample data in the tester class. *You should try it with real test data (output.txt) and compare your results with actual test results (last column in the file). *The accuracy will be your right decisions divided by the number of samples in output data. */ public int[] CalcAccuracy(String inputFeatureWLabel) {
// write your code here.
}
/** * In this function you should * be able to print all the nodes of decision tree in a pre-Order Traversal. */ public void print() {
// write your code here. } /** * toString */ public String toString() {
StringBuilder out = new StringBuilder("(" + featureIndex + " of " + dim + " ," + threshold + " , [" + classFreq[0]); for (int i = 1; i < nClasses; i++) { out.append(",").append(classFreq[i]); } out.append("] of ").append(total).append(")");
return out.toString(); }
}
---------------------------------------------------
BaseDecisionTreeTester.java
package assignment2; import java.util.*;
public class BaseDecisionTreeTester {
/* * Build a decision tree * feature 0 is gender (0: female 1:male) * feature 1 is age * feature 2 is sibling count * class 0 is not survived * class 1 is survived * read the train file */ public static void main( String [] args ) { BaseDecisionTree.nClasses = 2; BaseDecisionTree.dim = 3; /* * is man? */ BaseDecisionTree s1 = new BaseDecisionTree(0, 0.99); BaseDecisionTree s2 = new BaseDecisionTree(); // leaf s2.setProb(1, 85); // survived at 36% probability
/* * is age < 9.5 */ BaseDecisionTree s3 = new BaseDecisionTree(1, 9.9);// BaseDecisionTree s4 = new BaseDecisionTree(); // leaf s2.setProb(0, 61); // not survived at 61% /* * is Siblings>2.5 */ BaseDecisionTree s5 = new BaseDecisionTree(2,2.9); // leaf BaseDecisionTree s6 = new BaseDecisionTree(); // leaf s6.setProb(0, 15); // not survived at 2% BaseDecisionTree s7 = new BaseDecisionTree(); // leaf s7.setProb(1, 25); // survived at 20%
s1.setSmallerBranch(s2); s1.setGreaterBranch(s3); s3.setSmallerBranch(s4); s3.setGreaterBranch(s5); s5.setSmallerBranch(s6); s5.setGreaterBranch(s7); /* * this part is just for test your code. we simulated the tree and you can see * the right answers if you want o check them with yours. "1" means survived and "0" * means not survived. */ double[] test1 = {1,2,4}; System.out.println("Man, 2 years old with 4 siblings = " + s1.getDecision(test1));//answer is:0 double[] test2 = {1,10,5}; System.out.println("Man 10 years old with 5 siblings = " + s1.getDecision(test2));//answer is:1 double[] test3 = {0,2,0}; System.out.println("Woman 2 years old = " + s1.getDecision(test3));//answer is 1 double[] test4 = {1,20,0}; System.out.println("Woman 40 years old = " + s1.getDecision(test4));//answer is:1 double[] test5 = {1,34,1}; System.out.println("Man, 10 years old with 1 siblings = " + s1.getDecision(test5));//answer is:0 /* Please comment out this part after you calculate the probabilities from * input data. These inputs are based on output data and the answer for them is * in front of each one. you can test your decision tree with these.*/
/* double[] test1 = {0,39,1}; System.out.println("Man, 2 years old with 4 siblings = " + s1.getDecision(test1));//answer is 1 double[] test2 = {1,24,2}; System.out.println("Man 10 years old with 5 siblings = " + s1.getDecision(test2));//answer is 0 double[] test3 = {0,2,4}; System.out.println("Woman 2 years old = " + s1.getDecision(test3));//answer is 0 double[] test4 = {0, 40,4}; System.out.println("Woman 40 years old = " + s1.getDecision(test4));//answer is 0 double[] test5 = {1, 10, 1}; System.out.println("Man, 10 years old with 1 siblings = " + s1.getDecision(test5));//answer is 0 */ } }
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
