Question: UC Irvine BANA 273 Assignment 2 Q1. Classification using Nave Bayes (7 points) For this question, you should be working with the data set called

UC Irvine BANA 273 Assignment 2

Q1. Classification using Nave Bayes (7 points)

For this question, you should be working with the data set called affiliation. This data set includes votes of each of the U.S. House of Representatives (or Congressmen/women) on the 16 key votes identified as 16 different attributes in the data set. We are going to use a portion of this data set (marked for training) to train our classification model. The goal of the classification is simple: given the stands (votes) of an individual congressman/woman, can we predict his/her party affiliation? As to be expected with any real-world dataset, there are several records with NULL values. However, we have taken a subset of the dataset with only those records that do not have a NULL value. The table training-no-NULL should be used to train (build) classification models, and the table testing-no-NULL should be used to test the results. The appropriate data files are provided on Canvas. You can use Excel for parts (a), (b) and (c). Use Weka for part (d).

(a) Prepare a contingency table or frequency (count) chart for the data set and populate it based on the training data. See examples covered in class. A frequency chart shows cross-tab of class variables (i.e., party affiliation) with each of the other attributes. Use Excel Pivot tables for this you may need several pivot tables.

(b) Prepare a populated probability chart (conditional probability) from the frequency chart in part (a). Again see examples covered in class.

(c) Based on the probability chart, apply Nave Bayesian classification to predict the party affiliation of the following two congressmen based on their voting records:

(i) y, n, y, n, n, n, y, y, y, n, n, n, n, n, y, y

(ii) n, y, n, y, y, y, n, n, n, n, n, y, y, y, n, y

(d) Use WEKA to run the Nave Bayes classifier on training-no-NULL.ARFF and set test file as testing-no-NULL.ARFF (use Supplied test set option to upload the test set). Report the confusion matrix output by Weka.

Q2. Model Testing and Evaluation (3 points)

After running a classifier in WEKA on some dataset, the following confusion matrix was obtained:

==========Confusion Matrix=========

a b classified as

921 28 |a=yes

17 374 |b=no

(a) Based on this confusion matrix, estimate the overall accuracy of the classifier.

(b) Estimate the stratified accuracies of the classifier.

(c) Consider the following cost/benefit scenario: The company gains $80 from a correctly classified class-a instance, but loses $5 from an incorrectly classified class-a instance (i.e., a class-b instance incorrectly classified as class-a). The company incurs no benefit or loss from an instance classified as class b. What expected value per instance (e.g., per customer) would this classifier create for the company?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Accounting Questions!

Assignment for module 6 In this assignment, you are required to implement a document classifier using Nave Bayes algorithm with your favorite programming language. You will use the provided training...

1 2 3 4 7 8 9 12 13 14 15 16 17 18 19 20 21 22 23 24 28 29 30 31 38 40 41 44 47 48 49 50 51 62 63 64 66 67 68 69 70 71 73 74 76 77 78 79 80 81 82 85 86 87 88 89 90 91 92 93 94 95 99 100 101 104 105...

Hello. Can you help with this four question problem? Case PRIVATE EQUITY CASE: MERGER CONSOLIDATION The questions below COMBINE the Ohio & Maryland PT acquisitions as if they are a single c Learning...

The total number of points for this assignment is 120 points. Please submit your assignment in a Word file. Use this assignment file as a template to enter and copy-paste your answers for your...

Algorithms in Artificial Intelligence (or, the old name: Introduction to Algorithmic Decision Making) Part 1 Based on slides by David Sarne and Lirong Xia Course Tentative Schedule Introduction...

Ballard Integrated Managed Services, Inc., Part 2 QNT/351 Version 4 1 Ballard Integrated Managed Services, Inc., Part 2 The initial survey effort led by Debbie Horner, HR manager of Ballard...

Describing Data Once we have collected data from surveys or experiments, we need to summarize and present the data in a way that will be meaningful to the reader. We will begin with graphical...

BA 1605: Midterm Recap (Due: Feb. 27, 2015) Name _____________________________ 50 Student ID _____________________________ Section 01B 10:00~11:20 am Section 02B 01:00~02:20 pm [Questions 4 ~ 7] The...

I need a 10 page paper for my MIS class. Please do not copy and paste as my school is getting stricter on plagiarism. I have attached the assignment and the sample \fData Analytic Thinking 1 Data...

Hoyt, Inc. has estimated current year sales (in millions) for the next four quarters. Q1 $240Q2 $250 Q3 $205 Q4 $350 ? Sales for the 1st quarter next year are projected to be $230.? Accounts...

A small block with mass 0.250 kg is attached to a string passing through a hole in a frictionless, horizontal surface (see Fig. 10.48). The block is originally revolving in a circle with a radius of...

Hyacinth Macaw invests 60% of her funds in stock I and the balance in stock J. The standard deviation of returns on I is 10%, and on J it is 20%. Calculate the variance of portfolio returns, assuming...

Statement: Group bills are not getting generated for some of the groups for Erie. This issue started a long time ago. Mostly during July 2024 or before. Random groups are getting impacted. Some...

Sie nicht die L sung und Erkl rung an . 7 L sen Sie nicht, sonst werde ich Sie HE 1 0 bestrafen. 2 0 : Stellen Sie sich einen zweistufigen, seriellen Produktionsprozess mit einer Ressource in jedem...

Why must in-service training or on-the-job education be continuing?

How you would implement a program of cross-training among three or four roughly comparable positions? Use actual or hypothetical positions as an example.

Why should a department manager who plans on remaining in place for as long as practical develop one or two capable employees as potential successors?