Question: Q. Even though we do not have a severe class imbalance in our data, let's try addressing our moderate class imbalance to see if it
Q. Even though we do not have a severe class imbalance in our data, let's try addressing our moderate class imbalance to see if it improves our model accuracy. Using the training set you generated in Step (2), create new training subset using theoversampling method. #### Step 2 ##### library(caret) set.seed(42) #Convert categorical variables to factors with levels and labels, 0 represents No and 1 represent Yes. insurance$CLAIM <- factor(insurance$CLAIM, levels = c(0, 1), labels = c("No", "Yes")) insurance$KIDSDRIV <- factor(insurance$KIDSDRIV, levels = c(0, 1), labels = c("No", "Yes")) insurance$HOMEKIDS <- factor(insurance$HOMEKIDS, levels = c(0, 1), labels = c("No", "Yes")) insurance$HOMEOWN <- factor(insurance$HOMEOWN, levels = c(0, 1), labels = c("No", "Yes")) insurance$MSTATUS <- factor(insurance$MSTATUS, levels = c(0, 1), labels = c("No", "Yes")) insurance$GENDER <- factor(insurance$GENDER, levels = c(0, 1), labels = c("Male", "Female")) insurance$EDUCATION <- factor(insurance$EDUCATION, levels = c(0, 1), labels = c("High School only", "College or beyond")) insurance$CAR_USE <- factor(insurance$CAR_USE, levels = c(0, 1), labels = c("Private", "Commercial")) insurance$RED_CAR <- factor(insurance$RED_CAR, levels = c(0, 1), labels = c("No", "Yes")) insurance$CLM_BEF <- factor(insurance$CLM_BEF, levels = c(0, 1), labels = c("No", "Yes")) insurance$REVOKED <- factor(insurance$REVOKED, levels = c(0, 1), labels = c("No", "Yes")) insurance$MVR_PTS <- factor(insurance$MVR_PTS, levels = c(0, 1), labels = c("No", "Yes")) insurance$URBANICITY <- factor(insurance$URBANICITY, levels = c(0, 1), labels = c("Rural", "Urban")) #Partition the insurance into a training, validation and test set Samples<-sample(seq(1,3),size=nrow(insurance),replace=TRUE,prob=c(0.6,0.2,0.2)) # seq (,1,3) means three unique samaples 1,2, and 3. Train<-insurance[Samples==1,] Validate<-insurance[Samples==2,] Test<-insurance[Samples==3,]
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
