Question: Checkpoint 1 ( HW 8 ) TestGeneraton - Due one week from release. Checkpoint 2 ( HW 9 ) TestGP - Due two weeks from
Checkpoint HW TestGeneraton Due one week from release.
Checkpoint HW TestGP Due two weeks from release.
The next step in the genetic programming project is to read and store a data file such as the wine quality data set Each of the trees in a generation of trees will be tested against the selected data file, and its "fitness" thus measured. Here is a simple example. Suppose there is just a single independent variable x A very short data file might look like this:
y x
Now suppose the tree is x x We want to find out how close the tree value, for each x value, is to the given y value in the data. One standard way of doing this is to add up over the rows of data the square of the "deviation". For this example, the result is
Fitness Math.pow
Math.pow
Math.pow
The work youve done so far allows you to evaluate any tree on a double an array of values for x x corresponding to a single data row. The next step is to be able to evaluate over multiple rows subtracting and squaring for each row, as above and sum the results for each row.
To do this, use the two classes DataRow and DataSet, from the Linear Regression project.
GPTree
Once the tabular.DataRow and tabular.DataSet classes are working correctly, the next step is to modify the GPTree class to implement Comparable and Cloneable and to have the following methods:
public void evalFitnessDataSet dataSet accepts a DataSet object as its argument. Since you already have an eval method that takes a double it shouldn't be too hard to extract each DataRow's array of x values and feed it to your existing method code reuse at work The GPTree eval method should run through each of the DataRows, evaluate the tree, subtract the y value, and square the result, all the while keeping a running sum of the squared differences. The final sum is the GPTree's fitness value.
public double getFitness return the fitness computed after evalFitness is called.
public int compareToGPTree t compare the fitness values and return for less than, for greater than, and when the values are equal.
public boolean equalsObject o return true when compareToGPTree o is and false otherwise. Make sure to check to see if the object is not null, and is a GPTree first, and if it's not a GPTree or it is null then return false.
public Object clone in addition to calling clone on super similar to clone in Node, and then make sure to clone root since it is a Cloneable Object or if you are using the Algebra implementation from HW then you can use the copy constructor to copy root
Generation
The last steps are the creation of the Generation class. The Generation class should probably have the following constructor and methods:
Generationint size, int maxDepth, String fileName creates a DataSet from using the fileName, then creates the factories and random number generator necessary to construct a GPTree. Then creates an array of size GPTrees each with a maxDepth maximum depth.
public void evalAll This evaluates the current generation of GPTrees by evaluating the Fitness of each tree against the current DataSet and then sorts the array in place using Arrays.sort
public ArrayList getTopTen this returns an ArrayList of the top GPTrees. ie the trees with the lowest fitness in increasing order of fitness.only works after evaluating all.
public void printBestFitness prints the best fitness value only works after evaluating all.
public void printBestTree prints the best Tree only works after evaluating all
public void evolveFor Checkpoint select of the more fit trees at random, clone each tree and then call crossover. Add these to the new array of children. Repeat array size times until the new array has the same number of trees in the next generation.
Checkpoint TestGeneration
As in the last homework, write a test class that demonstrates your stuff. Call it TestGeneration. Have this classs main method prompt the user for a data file. Then create a generation of GPTrees. Get the data into a DataSet object, and evaluate each GPTree. Print out the GPTree with the smallest fitness. After all, this is the tree that best fits the data. Then print the fitnesses of each of the top ten GPTrees and make sure that they are in increasing order. The output for the top ten fitnesses should start:
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
