Question: allocate space for the data array, making sure that everything is properly initialized. The data will be pulled from a file specified as the input

allocate space for the data array, making sure that everything is properly initialized. The data will be pulled from a file specified as the input to the constructor. The first line of this file will contain a header with

2

numbers: the first specifies the number of datapoints in the file, and the second specifies the dimension of each datapoint. The file will thus have a number of rows specified by the first number in the header, and a number of columns specified by the second. Once the data array has been initialized consistent with the header information in the file, the KMC constructor can pull all the data from the file into this array, and store the number and dimension in the numdata and dim fields.

The train function will receive the three inputs discussed above, specifying how the training is to be accomplished. The first input is the number of training cycles to use, and the third is the number of centers to use in the clustering

/

classification analysis

note that train will have to allocate and properly initialize the c and nc fields in the KMC class consistent with the specified number of centers and the dimension of the datapoints. For the starting values of the centers, use the first nc points in the data array. Finally, the second input to train is a float specifying the fraction of the data in data that should be used in the training

(

the rest will be reserved to test the accuracy of the learning when it has been completed

) .

If we call this second input frac, then only the first frac

*

numdata points in data should be used for the training.

The classify function identifies which center a specified datapoint is closest to

,

using the vector norm measurement. The output should be the index in the c array of the closest center to the input vector. The centers function is just a simple accessor that returns the c array as output.

When all this code is written properly, with the iris

2 .

dat file providing the data, the

following output should appear in the file results.dat.

0

[5 3.40244 1.47805 0.243902]

1

[5.825 2.73636 4.34545 1.41591]

2

[6.925 3.1 5.75714 2.06786]

Finally now to consider the scoring of how well the trained model performs at classifying data. This will be accomplished by the final member function in KMC called score, which outputs its results in a learningScore structure. The details of how this scoring is accomplished are provided in the following section.

Note: The provided kmc

.

cpp file already contains all the code needed for the Vector class and its overloaded operator. All you have to do is write the code for the new KMC class to implement the training process described using the vector arithmetic enabled by the Vector class. Your solution is expected to use these features of the Vector class to accomplish its calculations.

Note: Your submission will be graded first using the provided iris

2 .

dat file, then with a different data file. The number of data points, the dimension of each data point, and the number of clusters

/

types

,

will be different in the new dataset.

Scoring

There are several metrics used to evaluate the quality of the model developed by a machine learning algorithm, including a

precision

score

,

recall

score

,

and a composite metric called

,

which is defined in terms of

,

The

and

metrics in turn are calculated by comparing the predictions the trained model makes with the

(

assumed known

)

type that each data point actually corresponds to

.

In particular we need to count the number of

true positive

(

)

results, where the model correctly predicts the type represented by a data point, the number of

false negative

(

)

results, where the model incorrectly predicts a different type from that represented by the data, and the number of

false positive

(

)

results which counts the actual type predicted by the model when its prediction is incorrect. In terms of

(

,

,

)

the

and

scores for each type are computed as

There is a separate

score for each type in the dataset, and it is common to average the scores for the different types to produce a composite, overall score.

The score function for the KMC class must compute these metrics and store them in

a learningScore structure which is the output from the function. The identification of

the

(

,

,

)

score for each type can be accomplished as follows:

For each point in the dataset, determine how the model classifies it

(

using the classify function

) .

Suppose that the model classifies the point as type j while the actual classification of the point is type k

.

If j and k match, increase the tp score for type k by one. Otherwise, increase the fn score for type k by one, and increase the fp score for type j by one.

Once every data point has been processed according to

1), 2)

above, you will have the

(

,

,

)

scores for each type, and can compute the corresponding

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Finance Questions!

Its about malloc function and I need your help desperately..:( I know there are a lot to do.. I am working on my own but not sure I can finish them. Please help me.. Input files: 1001 1591.63 M 1011...

Directions: ( CODE IN C ONLY!!!) Complete the following homework assignment using the description given in each section. Do not email code to TAs for help. They cannot help fix compile errors via...

Purpose: Use command line arguments for input Use malloc() and free()function. Collect input from a text file. Write output to a text file. Creating integer, double and character arrays. To create...

It's time for some data mining! For this assignment you will write a C program in a file called calls.c and a Makefile which creates the executable calls that will open up a set of files (usually...

Programming Assignment #2: Smart Arrays COP 3502, Fall 2017 Due: Sunday, September 24, before 11:59 PM Table of Contents...

Table of Contents Abstract........................................................................................................2 1....

Programming Assignment #2: Smart Arrays COP 3502, Fall 2017 Due: Sunday, September 24, before 11:59 PM Table of Contents...

(JAVA - DATA STRUCTURES) Hi, THIS IS THE FOURTH TIME I HAVE POSTED THIS QUESTION AND NOBODY WANTS TO HELP ME. PLEASE, I NEED SOMEONE TO HELP ME. I need help with the program CountryDisplayer.java and...

Try to draw the pressure angle of Gam mechanism's follower shown in Fig.4 A b) Fig 4

Because of the outbreak of Covid-19, the S&P 500 stock index fell over 30% from its recent high level. At the same time, the volatility index of the S&P 500 (a measure of the market's...

What is the definition of project operating cash flow? How does this differ from net income?

Do you think that outsourcsing computer software to LCC is going to be a problem in 2023 and 2024, given the tension between the US and China, and Russia