Question: Please implement a Naive Bayes classifier in Python 3 and apply it to the mushroom data sets provided: mushroom - training.data, and mushroom - testing.data.

Please implement a Naive Bayes classifier in Python

3

and apply it to the mushroom data sets provided: mushroom

-

training.data, and mushroom

-

testing.data. You are to train using just the training set, but provide classification accuracy for both the training and testing data sets after learning has occurred. Apply the classifier to these datasets first using no m

-

estimate, then consider how using m

= 1

affects the accuracy.

In addition to providing the code and results just described, please provide thorough and well

-

considered answers to the following questions:

Is the naive Bayes assumption valid and correct here? If not, does it matter here?

Did using an m

-

estimate help improve the classifier performance? Why or why not?

You may use and extend the python code I provide to you if you choose, or you may implement your own code from scratch to complete this assignment. Please submit your answers in the BlackBoard assignment submission field text box, but I will grade your source code by pulling from your GitHub repository. So make sure it is pushed by the due date.

The file format is a very simple format to be used for a variety of classification

(

supervised learning

)

and clustering

(

unsupervised learning

)

tasks. It can represent categorical, ordinal

(

integer

),

or numeric

(

real

)

data. Attributes are named, and for classification tasks, one attribute is typically called "class". In such cases, the dataset will create a new attribute for each instance called "assigned

-

class", though no value is bound for that attribute for any instance by default. The "class" attribute is the true concept class value as given in the data file, while the "assigned

-

class" attribute is intended to facilitate the assignment of a class value by some classifier.

There are two types of lines expected in a data file:

Lines specifying attribute information;

Lines specifying instance data.

In the first case, lines must be in the form:

: :

The attribute name can be any string that does not contain a

"

"

character, but they must be unique for each attribute. Attribute type can be "cat", "ord", or "num" for categorical, ordinal, or numeric data, respectively. In the case of categorical data, the attribute values specify all the allowable values in that category. In the case of ordinal or numeric data, the Dataset file reader will establish a range based on the minimal and maximal values in the list provided.

In the second case, lines must be in the form:

,, . . .,

There must be the same number of values in the line as their are attribute lines in the file. The position of the lines does not matter, but the order does. That is

,

the code will assume that the values are ordered in the same order as attributes are defined in the file. The instance data is stored in the data set as a list of dictionaries. Each instance is a dictionary keyed by attribute name.

I have provided a python module called dataset.py

.

This program implements a simple class for reading in data in the format I have provided, and it also includes some convenience functions for helping with counting. As a hint, I suggest you look at method of the Dataset class called selectSubset

() .

The module is already fairly capable as written. Indeed, assuming both dataset.py and mushroom

-

training.data are in the current directory, you can enter a Python

3

command interpreter and type the following to get a sense for what is possible:

import dataset

=

dataset.Dataset

("

mushroom

-

training.data"

)

("

Attributes in the data set are:

",

.

attributes.keys

())

selectionCriteria

= {"

cap

-

shape":"b

",

"class":"p

"}

("

There are", len

(

.

instances

),

"instances in total"

)

("

There are", len

(

.

selectSubset

(

selectionCriteria

)), \

"

poisonous examples with a bell

-

shaped cap"

)

Read through the internal documentation in dataset.py to understand it

s use. Remember that you can always type help

(

dataset

.

Dataset

)

in the interpreter if the module has been imported. Please report any problems with the code to me

.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Please implement a Naive Bayes classifier in Python 3 and apply it to the mushroom data sets provided: mushroom - training.data, and mushroom - testing.data. You are to train using just the training...

Implement a Naive Bayes Classifier that classifies individuals as Democrats or Republicans, using the 1 6 attributes and two classes from the Congressional Voting Records dataset Some of the features...

Implement a Naive Bayes Classifier to classify individuals as Democrats or Republicans using the 1 6 attributes and 2 classes from the Congressional Voting Records dataset. Some of the features...

Assignment for module 6 In this assignment, you are required to implement a document classifier using Nave Bayes algorithm with your favorite programming language. You will use the provided training...

PROBLEM Documentation for each function: - CountVectorizer (http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html) - TfidfTransformer...

(REALLY NEED HELP CREATING THIS CODE IN FULL AND ITS COMPLETE ENTIRETY... ALL OF THE DETAILS ARE PROVIDED AND THE CODE SHOULD HAVE EACH PART FOR EACH QUESTION LABELED SEPARATELY... PLEASE HELP ME AND...

Assignment Exercise 2: PART A: Q1: Using a data set of your choice, do the following. Remember that the only pre-requisite before setting about with Naive classification is to have an existing set of...

Implement in C + + 1 1 a Naive Bayes Classifier that classifies individuals as Democrats or Republicans, using the 1 6 attributes and two classes from the Congressional Voting Records dataset Some of...

In this question, you will learn to build a Naive Bayes Classifier for the binary classification task. 1 . Dataset: "Financial Phrasebank" dataset from HuggingFace . 1 To load the data, you need to...

CSCI 5525 MACHINE LEARNING, Fall 2017, Prof Schrater Homework 1 September 27, 2017 1. For data (x, y) with a joint distribution p(x, y) = p(y|x)p(x), the expected loss of a function f (x) to model y...

Consider the November 2012 transactions for Shine King Cleaning that were presented in Chapter 2. The bank statement dated November 30, 2012, for Shine King follows. Requirements 1. Prepare the bank...

1. Should inside traders, who are nonviolent, white collar criminals, be subject to Mafia-style investigation tools? 2. How can a stock trader know when she or he is receiving inside information that...

A substantive test of transactions to test the completeness assertion includes: tracing a sample of time sheets to the payroll register. recomputing the accuracy of a sample of payroll checks....

MATLAB Il the functions (and p (20%) Plot the surface generated by z-5x2-2y2-1 in the intervals-4S x s4 and-4 mesh or surf commands and chose an increment of 0.1. Label the x-, y-, and z-axes as "X",...