Question: This project is to read and store a data file ( such as the wine quality data set ) . When considering a regression or

This project is to read and store a data file (such as the wine quality data set). When considering a regression or classification task, the data are often stored in csv files. These files have one dependent variable, y, which is what you are typically trying to predict, and the rest of the values are dependent variables.
For example, suppose there is just a single independent variable x0. A (very short) data file might look like this:
y , x0
1.2,1.0
4.1,2.0
8.8,3.0
Now suppose we start with a prediction 2* x0+0.5. We want to find out how close the predicted value, for each x0 value, is to the given y value in the data. One standard way of doing this is to add up (over the rows of data) the square of the "deviation". For this example, the result is
sumSqError = Math.pow((((2*1.0)+0.5)1.2),2)
+ Math.pow((((2*2.0)+0.5)4.1),2)
+ Math.pow((((2*3.0)+0.5)8.8),2)
=2.672
For now, we are breaking this up into 3 steps. 1. making a data row class that stores each row by separating the dependent variable from an array of independent variables, 2. making a data set class that can read a file and produce and save a list of data rows for later processing, and 3. making a linear regression class that can perform multiple linear regression and make a prediction based on the regression.
To do this, create four new classes, DataRow, DataSet, Model and LinearModel.
DataRow
A DataRow object will hold a y value and an array of x values. (For the example above, this array would only be of length 1, but other data sets may have more than one x variable, so we really do need an array.) A DataSet will hold an ArrayList of DataRow objects that is an ArrayList. The getDependentVariable() method gets y , and getIndependentVariables() gets the array of independent variables x. The constructor for DataRow takes the y value and the array of x values as separate arguments to make it easier to store them.
DataSet
The DataSet class has the responsibility of reading the data from a data file. Write a constructor that takes a String as its argument. The String holds the name of the data file. To keep things simple, assume that the data file contains, as its first row, the names of each variable. Following this, the file contains the first y value followed by the x value(s) on a line by itself, then the next y value and the corresponding x values on the next line and so forth. The file for the example data above might look like this:
quality, acidity
1.2,1.0
4.1,2.0
8.8,3.0
The only real requirement is that the data file has commas between the entries. You can use the Scanner class for this job, use nextLine() to get each row. Then use String's split() method to split the cells by comma into a String[]. Assume for now that the data file will be correct, but you can have basic error handling to ignore rows that have fewer columns than the number of colums in the first line.
The DataSet class has methods to access the number of independent variables getNumIndependantVariables() and the ArrayList, getRows().

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!