Question: DATA MINING - DISCRIMINANT ANALYSIS - PLEASE HELP Identifying Good System Administrators A management consultant is studying the roles played by experience and training in
DATA MINING - DISCRIMINANT ANALYSIS - PLEASE HELP
Identifying Good System Administrators
A management consultant is studying the roles played by experience and training in a system administrators ability 459 to complete a set of tasks in a specified amount of time. In particular, she is interested in discriminating between administrators who are able to complete given tasks within a specified time and those who are not. Data are collected on the performance of 75 randomly selected administrators. They are stored in the file SystemAdministrators.xls.
Using these data, the consultant performs a discriminant analysis. The variable Experience measures months of full-time system administrator experience, while Training measures number of relevant training credits. The dependent variable Completed is either Yes or No, according to whether or not the administrator completed the tasks.
a. Create a scatterplot of Experience versus Training using color or symbol to differentiate administrators who completed the tasks from those who did not complete them. See if you can identify a line that separates the two classes with minimum misclassification.
b. Run a discriminant analysis with both predictors using the entire dataset as training data. Among those who completed the tasks, what is the percentage of administrators who are classified incorrectly as failing to complete the tasks?
c. Compute the two classification scores for an administrator with 4 years of higher education and 6 credits of training. Based on these, how would you classify this administrator?
d. How much experience must be accumulated by a administrator with 4 training credits before his or her estimated probability of completing the tasks exceeds 50%?
e. Compare the classification accuracy of this model to that resulting from a logistic regression with cutoff 0.5.
f. Compute the correlation between Experience and Training for administrators that completed the tasks and compare it to the correlation of administrators who did not complete the tasks. Does the equal correlation assumption seem reasonable?
Here is the dataset (please use excel/XLMiner):
| Experience | Training | Completed task |
| 10.9 | 4 | Yes |
| 9.9 | 4 | Yes |
| 10.4 | 6 | Yes |
| 13.7 | 6 | Yes |
| 9.4 | 8 | Yes |
| 12.4 | 4 | Yes |
| 7.9 | 6 | Yes |
| 8.9 | 4 | Yes |
| 10.2 | 6 | Yes |
| 11.4 | 4 | Yes |
| 8.6 | 4 | Yes |
| 9.2 | 4 | Yes |
| 11.7 | 8 | Yes |
| 7.6 | 4 | Yes |
| 7.0 | 4 | Yes |
| 4.9 | 4 | No |
| 7.1 | 6 | No |
| 5.0 | 4 | No |
| 4.8 | 4 | No |
| 4.4 | 4 | No |
| 4.2 | 4 | No |
| 6.3 | 4 | No |
| 4.7 | 4 | No |
| 5.2 | 4 | No |
| 8.9 | 4 | No |
| 5.5 | 6 | No |
| 5.4 | 4 | No |
| 5.7 | 4 | No |
| 7.7 | 4 | No |
| 2.7 | 4 | No |
| 6.7 | 4 | No |
| 6.2 | 6 | No |
| 6.6 | 6 | No |
| 3.6 | 4 | No |
| 5.0 | 4 | No |
| 5.4 | 4 | No |
| 7.3 | 6 | No |
| 7.6 | 4 | No |
| 8.6 | 6 | No |
| 12.2 | 4 | No |
| 5.8 | 4 | No |
| 6.1 | 4 | No |
| 5.5 | 6 | No |
| 4.1 | 4 | No |
| 4.5 | 8 | No |
| 7.8 | 4 | No |
| 7.0 | 4 | No |
| 4.3 | 4 | No |
| 6.1 | 4 | No |
| 4.9 | 4 | No |
| 6.9 | 4 | No |
| 5.2 | 4 | No |
| 6.8 | 4 | No |
| 5.8 | 4 | No |
| 4.6 | 4 | No |
| 7.2 | 4 | No |
| 5.2 | 4 | No |
| 5.1 | 4 | No |
| 9.2 | 4 | No |
| 4.7 | 4 | No |
| 5.9 | 6 | No |
| 8.0 | 4 | No |
| 4.0 | 4 | No |
| 6.3 | 4 | No |
| 6.6 | 4 | No |
| 6.5 | 4 | No |
| 7.4 | 8 | No |
| 6.5 | 4 | No |
| 8.2 | 4 | No |
| 5.9 | 4 | No |
| 5.6 | 4 | No |
| 5.9 | 8 | No |
| 6.4 | 6 | No |
| 3.8 | 4 | No |
| 5.3 | 4 | No |
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
