Question: https://www3.nd.edu/~busiforc/problems/DataMining/Accidents.xls you can access data from this address. The file Accidents.xls contains information on 42,183 actual automobile accidents in 2001 in the US that involved

contains information on 42,183 actual automobile accidents in 2001 in the US

https://www3.nd.edu/~busiforc/problems/DataMining/Accidents.xls you can access data from this address.

The file Accidents.xls contains information on 42,183 actual automobile accidents in 2001 in the US that involved one of three levels of injury: no injury, injury", or "fatality. For each accident, additional information is recorded such as day of week, weather conditions, and road type. A firm might be interested in developing a system for quickly classifying the severity of an accident, based upon initial reports and associated data in the system (some of which rely on GPS-assisted reporting). Our goal here is to predict whether a new accident just reported will involve an injury (MAX_SEV_IR=1 or 2) or not (MAX_SEV_IR=0). For this purpose, create a new variable called INJURY that takes the value l" that means "with injury" if MAX_SEV_IR=1 or 2, and otherwise the value is O refereeing "no injury". Partition the data into training (60%) and validation (40%) sets. a) Compute the accuracy rate of each class for the validation set based on the Nave Rule. You can present the accuracy rates using a matrix (called confusion matrix) as show in the example below: Predicted Class-1 Class-2 10 3 Class-1 Actual Class-2 2 12 Here 10 out of 13 data points in Class-1 are correctly predicted and 12 out of 14 data points in Class-2 are correctly predicted. Overall accuracy rate is 22/27 = 82%. Accuracy rate in Class-1 = 10/13 = 77%, accuracy rate in Class-2 is 12/14=86% b) Assume that no information or initial reports about the accident itself are available at the time of prediction (only location characteristics, weather conditions, road conditions etc.) which predictors can we include in the analysis? (Please read the Data_Codes sheet). c) Run a Nave Bayes classifier on the complete training set by choosing the relevant predictors (continue from part-b), use INJURY as the response variable. Notice that all predictors are categorical. Show the classification matrix (confusion matrix) for the training and validation data. d) Is there any percent improvement relative to the Nave Rule? e) Run a Nave Bayes classifier using all predictors and INJURY as the response variable. Report again your error rates in both training and validation set with using confusion matrix. 1) Which analysis in part-b or in part-e would be appropriate if you consider applying Nave Bayes model that you created for the future accidents? Please explain your reasoning. g) Run a Nave Bayes classifier with the variables in part-b and response variable INJURY after partitioning the data into training (60%) and validation (40%) sets. Is there any affect of different partitioning on the accuracy results? If you observe a chance, please explain the possible reason. Note : I have posted a guideline for the usage of Naive Bayes in XLMiner. It might be helpful while using XL Miner. The file Accidents.xls contains information on 42,183 actual automobile accidents in 2001 in the US that involved one of three levels of injury: no injury, injury", or "fatality. For each accident, additional information is recorded such as day of week, weather conditions, and road type. A firm might be interested in developing a system for quickly classifying the severity of an accident, based upon initial reports and associated data in the system (some of which rely on GPS-assisted reporting). Our goal here is to predict whether a new accident just reported will involve an injury (MAX_SEV_IR=1 or 2) or not (MAX_SEV_IR=0). For this purpose, create a new variable called INJURY that takes the value l" that means "with injury" if MAX_SEV_IR=1 or 2, and otherwise the value is O refereeing "no injury". Partition the data into training (60%) and validation (40%) sets. a) Compute the accuracy rate of each class for the validation set based on the Nave Rule. You can present the accuracy rates using a matrix (called confusion matrix) as show in the example below: Predicted Class-1 Class-2 10 3 Class-1 Actual Class-2 2 12 Here 10 out of 13 data points in Class-1 are correctly predicted and 12 out of 14 data points in Class-2 are correctly predicted. Overall accuracy rate is 22/27 = 82%. Accuracy rate in Class-1 = 10/13 = 77%, accuracy rate in Class-2 is 12/14=86% b) Assume that no information or initial reports about the accident itself are available at the time of prediction (only location characteristics, weather conditions, road conditions etc.) which predictors can we include in the analysis? (Please read the Data_Codes sheet). c) Run a Nave Bayes classifier on the complete training set by choosing the relevant predictors (continue from part-b), use INJURY as the response variable. Notice that all predictors are categorical. Show the classification matrix (confusion matrix) for the training and validation data. d) Is there any percent improvement relative to the Nave Rule? e) Run a Nave Bayes classifier using all predictors and INJURY as the response variable. Report again your error rates in both training and validation set with using confusion matrix. 1) Which analysis in part-b or in part-e would be appropriate if you consider applying Nave Bayes model that you created for the future accidents? Please explain your reasoning. g) Run a Nave Bayes classifier with the variables in part-b and response variable INJURY after partitioning the data into training (60%) and validation (40%) sets. Is there any affect of different partitioning on the accuracy results? If you observe a chance, please explain the possible reason. Note : I have posted a guideline for the usage of Naive Bayes in XLMiner. It might be helpful while using XL Miner

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

The file Accidents.xls contains information on 42,183 actual automobile accidents in 2001 in the United States that involved one of three levels of injury: NO INJURY, INJURY, or FATALITY. For each...

The file accidents.csy contains information on actual automobile accidents in 2001 the United States that involved one of three outcomes: No-injury, Non-fatal, Fatal. For each accident, additional...

The file Accidents.csv below contains information on 42,183 actual automobile accidents in 2001 in theUnited States that involved one of three levels of injury: NO INJURY INJURY, or FATALITY. For...

you might need an Accidents data, if you do then i will send it to you via email. Problem 8.2 Automobile Accidents. The file Accidentscsv contains information on 42,183 actual automobile accidents in...

Automobile Accidents. The file accidentsFull.csv contains information on 42,183 actual automobile accidents in 2001 in the United States that involved one of three levels of injury: NO INJURY,...

Automobile Accidents. The file Accidents.jmp contains information on 42,183 actual automobile accidents in 2001 in the United States that involved one of three levels of injury: NO INJURY, INJURY, or...

The file Accidents.csv below contains information on 42,183 actual automobile accidents in 2001 in theUnited States that involved one of three levels of injury: NO INJURY INJURY, or FATALITY. For...

NEED A PYTHON CODE FOR THIS QUESTION PLEASE: Automobile Accidents. The file accidentsFull.csv contains information on 4 2 , 1 8 3 actual automobile accidents in 2 0 0 1 in the United States that...

a. What is a stratified random sample? b. What are the benefits of stratification? c. When is stratification most likely to be helpful?

i) The department of transport in Maseru is interested in collecting information about the average time spent by a motorist on the highway. An investigation carried out by the department claims that...

QUESTION 6 Which is NOT a reason a covered entity may deny an indviduals requets for access to all or a portion of the PMIl? Requests for poychoberapy notes Alsuests made by an inmate when the...

SIMAD UNIVERSITY Class: BACC25 Subject: Islamic Accounting Instructions: a) Follow The Instructions. Midterm Exam Instructor: All Ibrahim Date: 6-4-2022 b) You Have 1.5 Hrs. To Complete This Test. c)...

8. Sometimes viewed as a Scandinavian tortilla, these potato flatcakes are often sold in areas with high Scandinavian American populations: a. Lefse b. Lutefisk c. Aquavit d. Fiskepudding

What is the relationship between humans?

What is the orientation toward time?