Question: This project is to read and store a data file ( such as the wine quality data set ) . When considering a regression or
This project is to read and store a data file such as the wine quality data set When considering a regression or classification task, the data are often stored in csv files. These files have one dependent variable, y which is what you are typically trying to predict, and the rest of the values are dependent variables.
For example, suppose there is just a single independent variable x A very short data file might look like this:
y x
Now suppose we start with a prediction x We want to find out how close the predicted value, for each x value, is to the given y value in the data. One standard way of doing this is to add up over the rows of data the square of the "deviation". For this example, the result is
sumSqError Math.pow
Math.pow
Math.pow
For now, we are breaking this up into steps. making a data row class that stores each row by separating the dependent variable from an array of independent variables, making a data set class that can read a file and produce and save a list of data rows for later processing, and making a linear regression class that can perform multiple linear regression and make a prediction based on the regression.
To do this, create four new classes, DataRow, DataSet, Model and LinearModel.
DataRow
A DataRow object will hold a y value and an array of x values. For the example above, this array would only be of length but other data sets may have more than one x variable, so we really do need an array. A DataSet will hold an ArrayList of DataRow objects that is an ArrayList. The getDependentVariable method gets y and getIndependentVariables gets the array of independent variables x The constructor for DataRow takes the y value and the array of x values as separate arguments to make it easier to store them.
DataSet
The DataSet class has the responsibility of reading the data from a data file. Write a constructor that takes a String as its argument. The String holds the name of the data file. To keep things simple, assume that the data file contains, as its first row, the names of each variable. Following this, the file contains the first y value followed by the x values on a line by itself, then the next y value and the corresponding x values on the next line and so forth. The file for the example data above might look like this:
quality, acidity
The only real requirement is that the data file has commas between the entries. You can use the Scanner class for this job, use nextLine to get each row. Then use String's split method to split the cells by comma into a String Assume for now that the data file will be correct, but you can have basic error handling to ignore rows that have fewer columns than the number of colums in the first line.
The DataSet class has methods to access the number of independent variables getNumIndependantVariables and the ArrayList, getRows
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
