Question: unknown length containing x and y values and then computes a line that best fits the data set. Remember, a line is defined by the

unknown length containing x and y values and then computes a line that best fits the data set. Remember, a
line is defined by the equation y=mx+b, where m is the slope and b is the y-intercept of the line. Such a
linear fit is commonly used to visualize linear trends of data. You will then plot the data set using a scatter
plot and a line representing your linear regression of the data. An example of this is shown by Figure 1,
where the blue dots are the (x,y) points from a data set, and the black line is a linear fit to the data.
Figure 1. Example of a linear fit (black line) of a data set (blue dots) on an x-y plot.
Algorithm:
Recall, the slope of a line, m, can be computed using any two points on the line, (x1,y1) and (x2,y2)
using the ratio of the "change in y" divided by the "change in x". That is:
m=y2-y1x2-x1
Once the slope is known, the y-intercept can also be computed using the formula: b=mx1-y1.
For the data set such as the blue points in Figure 1, we are not fitting a line between any two points, since
no two points are guaranteed to be on the line. Rather, a line is constructed in an average sense. The
following explains the algorithm on how to do this.
Let x be the array of x-values in the data set, and ?bar(y) be the array of y-values of the data set, where one of
the blue data point on the graph in Figure 1 is ). Next, let mean x represent the mean value (or
the average value) of the array x and mean y be the mean (or average) value of the array ?bar(y). The
average value of x can be written as an equation:
mean -x=1Ni=0N-1x[i]
where x[i] are the array values of x, and i=0N-1x[i] is the sum of all the array values. A similar expression
can be written for mean y.
After computing mean x and mean y, the slop of the fit line can be computed. This is done in an average
sense using the ratio of the sum of the difference of all y values from the y-mean, times the difference of
all the x-values from the x-mean, divided by the difference of all the x-values from the x-mean. This can
be written as an equation as:
)-)-)-)-
where the numerator and denominator are separate sums.
Given the slope, the y-intercept is simply:
)-)-
Using this algorithm, each point on the black line in Figure 1 is simply computed as:
yfit[j]=mfit**x[j]+bfit.
Functions needed for this project:
As part of this C++ project, you will need to develop at least five functions.
The first function is to read in a data set of unknown length that contains two columns of
data. You should store this in a single (one-dimensional) array that stores the two values
by column. That is, if there are N data points, the array will be of length 2N. The first N
values will be your x-array, and the second N-values will be your y-array. The function
should be passed the name of a file it will read in the (x,y) values from, and the function
will dynamically create an array to store the (x,y) values as just described, and then return
a pointer to the array and the length of the array.
The second function computes and returns the average (or mean) value of an array of
some length N.
The third function is passed the array storing the
unknown length containing x and y values and then

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!