Question: Overview: In this analysis you will develop logistic regression model based on the data set provided to predict whether or not the specimens are genuine.
Overview: In this analysis you will develop logistic regression model based on the data set provided to predict whether or not the specimens are genuine.
Data Set Information: Data (A6DATA.csv) were extracted from images that were taken from genuine and forged banknote-like specimens. For digitization, an industrial camera usually used for print inspection was used. The final images have 400x 400 pixels. Due to the object lens and distance to the investigated object gray-scale pictures with a resolution of about 660 dpi were gained. Wavelet Transform tool were used to extract features from images.
Attribute Information:
V1: variance of Wavelet Transformed image (continuous)
V2: skewness of Wavelet Transformed image (continuous)
V3: kurtosis of Wavelet Transformed image (continuous)
V4: entropy of image (continuous)
V5: class (0-forged, 1-genuine)
PART 1
Read the A6DATA.csv data file into RStudio. Run set.seed(222) for partitioning of the dataset into training (50%) and testing (50%). Report on the number of forged and genuine banknote-like specimens in the training and testing data.
PART 2
Develop a logistic regression model using the training data. Provide final logistic regression model (with only significant variables), equation for calculating probability that specimen is genuine, confusion matrix for both training & testing data, misclassification error for both training & testing data, and comment on the performance of the model.
PART 3
Develop logistic regression models with 60%/40%, 70%/30%, and 80%/20% partitioning into training and testing data sets using set.seed(222). Summarize training and testing accuracy, sensitivity and specificity for each and compare with 50%/50% performance using the table below. Recommend and comment on the best model for future use.
| Partitioning | Accuracy % | Sensitivity % | Specificity % |
| Training - 50% |
|
|
|
| Testing 50% |
|
|
|
| Training - 60% |
|
|
|
| Testing 40% |
|
|
|
| Training - 70% |
|
|
|
| Testing 30% |
|
|
|
| Training - 80% |
|
|
|
| Testing 20% |
|
|
|
PART 4
Compare the best and the worst logistic regression model in the previous question using ROC curve, AUC and best threshold values based on testing data. Discuss your results.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
