Question: By using the R Language, in the data set fractures.txt, Myers(1990) presents data on the number of fractures (y) that occur in the upper seams

By using the R Language, in the data set fractures.txt, Myers(1990) presents data on the number of fractures (y) that occur in the upper seams of coal mines in the Appalachian region of western Virginia. Four regressors were reported:

x1 = inner burden thickness (feet), the shortest distance between seam floor and the lower seam;

x2 = percent extraction of the lower previously mined seam;

x3 = lower seam height (fleet); and

x4 = time (years) that the mine has been in operation.

y x1 x2 x3 x4 2 50 70 52 1 1 230 65 42 6.9 0 125 70 45 1 4 75 65 68 0.5 1 70 65 53 0.5 2 65 70 46 3 0 65 60 62 1 0 350 60 54 0.5 4 350 90 54 0.5 4 160 80 38 0 1 145 65 38 10 4 145 85 38 0 1 180 70 42 2 5 43 80 40 0 2 42 85 51 12 5 42 85 51 0 5 45 85 42 0 5 83 85 48 10 0 300 65 68 10 5 190 90 84 6 1 145 90 54 12 1 510 80 57 10 3 65 75 68 5 3 470 90 90 9 2 300 80 165 9 2 275 90 40 4 0 420 50 44 17 1 65 80 48 15 5 40 75 51 15 2 900 90 48 35 3 95 88 36 20 3 40 85 57 10 3 140 90 38 7 0 150 50 44 5 0 80 60 96 5 0 80 85 96 5 0 145 65 72 9 0 100 65 72 9 3 150 80 48 3 2 150 80 48 0 3 210 75 42 2 5 11 75 42 0 0 100 65 60 25 3 50 88 60 20 

(a) Read in the data as a data frame. Add a new column called indicator to the data frame. If the y of a observation is above the median, indicator = 1; If not, indicator = 0.

(b) Explore the data graphically in order to investigate the association between indicator and the other features (x1, x2, x3, x4). Which of the other features seem most likely to be useful in predicting indicator? Scatter-plots and boxplots may be useful tools to answer this question. Describe your findings.

(c) Split your data set into a training set and a test set. The training set consists of the first 34 observations.

(d) Perform LDA on the training data in order to predict indicator using the variables that seemed most associated with indicator in part (b). What is the misclassification rate (test error) of the model obtained ?

(e) Perform kNN on the training data in order to predict indicator using the variables that seemed most associated with indicator in part (b). What is the misclassification rate (test error) of the model obtained?

(f) Perform logistic regression on the training data in order to predict indicator using the variables that seemed most associated with indicator in part (b). What is the misclassification rate (test error) of the model obtained?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!