Question: This is R studio code Hand in an R script file for this assignment. You are asked some questions below that amount to is one
This is R studio code
Hand in an R script file for this assignment. You are asked some questions below that amount to "is one model better than another", so add comments to your R file that answer those questions. In the text you write to answer those questions, you need to include the metric(s) and reasoning you're using to decide one model should be preferred over another.
In Chapter 5 of the textbook, the author builds decision trees using the German credit data and a decision table or rule list using the mushroom dataset. For this assignment, let's flip that around.
Build a C5.0 decision tree for the mushroom dataset. Compare a regular tree (trials = 1, the default value) with a boosted tree (trials = 10 or so), and then compare the best of these two trees with the decision table from the book for this dataset. Be sure to include plots and summaries of each tree. Do the regular and boosted trees agree on which feature is most important?
Build a decision table using JRip for the German credit dataset. Is it a better or worse model than the best decision tree the author builds in the text? Be sure to print out the rule list from JRip in your report.
I have been fighting with this code my teacher says its a problem with class and -ncol(mushrooms) but wont tell me how to fix it please help
#install.packages(c("mlbench", "C50", "OneR")) #credit_df <- read.table("mushrooms.csv") #read.cvs("mushrooms.csv", stringAsFactors = TRUE) install.packages("arules")
library(C50) library(mlbench) library(RWeka)
# Load the mushroom dataset #data(mushrooms) mushrooms <- read.csv("https://raw.githubusercontent.com/PacktPublishing/Machine-Learning-with-R-Third-Edition/master/Chapter05/mushrooms.csv")
#str(mushrooms) summary(mushrooms)
# The Id column has no predictive value since it's unique for each # row and would be different for any new data the model encounters # this is a new way to drop a column in R. Just set it to NULL. for (i in 1: ncol(mushrooms)) { mushrooms[,i] <- as.factor(mushrooms[,i]) } #mushrooms('type', 'cap_shape', 'cap_surface', 'cap_color', 'bruises', 'oder', 'gill_attachment') #mushrooms <-as.factor(mushrooms) # class feature is last, so it's index is ncol(mushroom) # drop it from x and call it y training_inds <- sort(sample(nrow(mushrooms), nrow(mushrooms)*0.7)) x_train <- mushrooms[training_inds, -ncol(mushrooms)] y_train <- mushrooms$class[training_inds] x_test <- mushrooms[-training_inds, -ncol(mushrooms)] y_test <- mushrooms$class[-training_inds]
# C5.0 decision trees library(C50)
c50_model <- C5.0(x_train, y_train) c50_preds <- predict(c50_model, x_test)
table(c50_preds, y_test)
mean(c50_preds == y_test)
summary(c50_model)
c50_model <- C5.0(x_train, y_train, trials=10) c50_preds <- predict(c50_model, x_test)
table(c50_preds, y_test)
mean(c50_preds == y_test) # FYI, this is supposed to work but doesn't. So it's commented out. #plot(c50_model)
c50_model <- C5.0(x_train, y_train, trials=100) c50_preds <- predict(c50_model, x_test)
table(c50_preds, y_test)
mean(c50_preds == y_test)
library("OneR")
training_data <- mushroom[training_inds, ] testing_data <- mushroom[-training_inds, ]
oner_model <- OneR(Class ~ ., data=training_data) oner_preds <- predict(oner_model , testing_data)
table(oner_preds, testing_data[,ncol(mushroom)]) mean(oner_preds == y_test)
# let's see the rule! summary(oner_model)
# Weka is a Java library (and nice standalone data mining software) # it includes a _different_ oneR model, so note the warning message # when we install it library("RWeka")
jrip_model <- JRip(Class ~ ., data=training_data) jrip_preds <- predict(jrip_model, testing_data)
table(jrip_preds, y_test) mean(jrip_preds == y_test)
# let's see the decision table jrip_model
# just FYI - Weka has a decision tree model, too. We can actually plot it. # The book tells us how C4.8 is a predecessor of C5.0 j50_model <- J48(Class ~ ., data=training_data) plot(j50_model) j50_model
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
