Question: Implement in C + + 1 1 a Naive Bayes Classifier that classifies individuals as Democrats or Republicans, using the 1 6 attributes and two

Implement in C++11 a Naive Bayes Classifier that classifies individuals as Democrats or Republicans, using the 16attributes and two classes from the Congressional Voting Records dataset
Some of the features contain the value "?",which typically represents a missing value. In this case, the data explicitly states that this symbol means neither "yes" nor "no."It represents a third value, "abstained." Solve the task in two ways: first, by treating the "?"value as the third option (abstained),and second, by filling the missing values with an approach of your choice, and justify why you selected that approach. Analyze the results.
For the Naive Bayes classifier, zero probabilities may occur, leading to inaccurate classification. To address this issue, apply the appropriate solutions: "Laplace Smoothing" and logarithms. When applying Laplace smoothing, test with different values of the parameter ,which indicates the degree of smoothing. Analyze the results.
To test the algorithm, split the data into training and testing sets in an 80:20ratio, ensuring the data is shuffled first. The split should be stratified to preserve the class distribution (267Democrats,168Republicans)in the resulting training and test sets.
The input should accept two possible values: 0and 1:
0means to process the data by treating the "?"symbols as a third value, "abstained."
1means to fill the missing values (marked with "?")using an approach of your choice.
The output should display the accuracy of the model on the training set (trained and tested on it),the accuracy and standard deviation of the model during 10-fold cross-validation on the training set, and the accuracy of the model on the test set.
Example input: 0
Example output:
Train Set Accuracy:
Accuracy: 92.80%
10-Fold Cross-Validation Results:
Accuracy Fold 1: 92.00%
Accuracy Fold 2: 91.50%
Accuracy Fold 3: 90.00%
Accuracy Fold 4: 93.00%
Accuracy Fold 5: 91.00%
Accuracy Fold 6: 92.50%
Accuracy Fold 7: 93.00%
Accuracy Fold 8: 91.50%
Accuracy Fold 9: 92.00%
Accuracy Fold 10: 93.50%
Average Accuracy: 92.10%
Standard Deviation: 1.10%
Test Set Accuracy:
Accuracy: 92.50%
For solving the task, you are allowed to use data structures such as DataFrame.
It is required to implement the algorithm from scratch! The use of external libraries is not allowed, except for standard libraries and those specifically related to data structures needed for implementing the algorithm. Please ensure that your solution reflects your understanding of the algorithm and does not rely on pre-existing implementations.
Congressional Voting Records dataset is a zipped folder that contains:
1. house-votes-84.data file, with the following information format:
republican,n,y,n,y,y,y,n,n,n,y,?,y,y,y,n,y
...
republican,n,y,n,y,y,y,n,n,n,y,n,y,y,y,?,n
2. house-votes-84.names file, with the following information format:
1. Title: 1984 United States Congressional Voting Records Database
2. Source Information:
(a) Source: Congressional Quarterly Almanac, 98th Congress,
2nd session 1984, Volume XL: Congressional Quarterly Inc.
Washington, D.C.,1985.
(b) Donor: Jeff Schlimmer (Jeffrey.Schlimmer@a.gp.cs.cmu.edu)
(c) Date: 27 April 1987
3. Past Usage
- Publications
1. Schlimmer, J. C.(1987). Concept acquisition through
representational adjustment. Doctoral dissertation, Department of
Information and Computer Science, University of California, Irvine, CA.
-- Results: about 90%-95% accuracy appears to be STAGGER's asymptote
- Predicted attribute: party affiliation (2 classes)
4. Relevant Information:
This data set includes votes for each of the U.S. House of
Representatives Congressmen on the 16 key votes identified by the
CQA. The CQA lists nine different types of votes: voted for, paired
for, and announced for (these three simplified to yea), voted
against, paired against, and announced against (these three
simplified to nay), voted present, voted present to avoid conflict
of interest, and did not vote or otherwise make a position known
(these three simplified to an unknown disposition).
5. Number of Instances: 435(267 democrats, 168 republicans)
6. Number of Attributes: 16+ class name =17(all Boolean valued)
7. Attribute Information:
1. Class Name: 2(democrat, republican)
2. handicapped-infants: 2(y,n)
3. water-project-cost-sharing: 2(y,n)
4. adoption-of-the-budget-resolution: 2(y,n)
5. physician-fee-freeze: 2(y,n)
6. el-salvador-aid: 2(y,n)
7. religious-groups-in-schools: 2(y,n)
8. anti-satellite-test-ban: 2(y,n)
9. aid-to-nicaraguan-contras: 2(y,n)
10. mx-missile: 2(y,n)
...
3. index file:
Index of voting-records
02 Dec 1996135 Index
30 Jun 19936868 house-votes-84.names
30 May 198918171 house-votes-84.data

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!