Question: Needs to be in C++ Given the training data set Dtrain specified by TrainData.csv and the test data set Dtest specified by TestData.csv, where D
Needs to be in C++
Given the training data set Dtrain specified by TrainData.csv and the test data set Dtest specified by TestData.csv, where D = {< x [i] , y [i] > | i = 1...n}, s.t. each x =< x1, x2, ..., xL > with a class label y {1, 1}. Moreover, each of the xi , i = 1, ..., L might be either discrete or continues valued, s.t.
Features with continues values: x1, x3, x5, x11, x12, x13
Features with discrete values: x2, x4, x6, x7, x8, x9, x10, x14
Please complete the following tasks. 1. (2.5 points) Build a naive Bayes classifier by returning the sets of parameters (P(y), P(xi |y), i = 1, ..,L) based on the Naive Bayes Algorithm.
P(y|x) P(y)P(x|y) = P(y) Y L i=1 P(xi |y)
A feature xi takes m discrete values in class yk, a discrete distribution of P(x = xi |y = yk) can be learned by parameters i,k,1, i,k,2, ..., i,k,m, s.t. P(x = xi |y = yk) = Pm j=1 i,k,j = 1.
A feature xi takes continuous values in class yk, we can learn a one-dimensional Gaussian for that feature. P(x = xi |y = yk) = N (i,k, 2 i,k ), where i,k and 2 i,k are respectively the mean and variance of feature xi for the instances in class yk.
2. (2.5 points) Apply your classifier for assigning a class label for each of the new instances x =< x1, x2, ..., xL > in the testing set by
y argmaxyk P(y = yk) Y L i=1 P(xi |y = yk)
Evaluate your classifier by accuracy
ACC(x [i] )Dtest = 1 |Dtest| Xn i=1 L(y i , y i )
where y i is the assigned class label by the classifier and y i is the true class label.
L(y i , y i ) = ( 1 if y i = y i 0 if y i 6= y i
Submission:
Submit a report that contains the screen shots for each of the questions.
Show the detailed parameters
Show the accuracy value
Submit the program source code file.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
