Question: lAssignment 3 Due Date: Sunday, October 2, 2022 The total number of points for this assignment is 60 points. Please submit your assignment in a

lAssignment 3 Due Date: Sunday, October 2, 2022
lAssignment 3 Due Date: Sunday, October 2, 2022 The total number of points for this assignment is 60 points. Please submit your assignment in a Word le. Use this assignment file as a template to enter and copy-paste your answers for your assignment submission. Keep the problem descriptions and insert your answers after each question. Please name your assignment with this format: Lastname.Firstname.Assignment3. 'I. (15 points) Download the BostonHousingles file (which has been used in Assignment 2). The target attribute in this dataset is CATMEDV (which is a binary attribute converted from MEDV in the BostonHousing.xls file). a. Within Excel, save the FullData sheet (with 506 records) as a CSV file, as you did for Assignment 2. Run Weka's support vector machines algorithm (SMO) on this data file, with 10-fold cross-validation. First, use the default parameter C = 1. Then, change C value to 10 and 100 in sequence. Show the output screens that display the 10-fold cross-validation error rates in these three cases. How does the error rate change as the C value increases? b. Based on the results with C = 100, what two attributes are the most important predictors? Explain the impact of these two predictors on classication in terms of how classification result will change when the value of a predictor increases or decreases. 2. (25 points) Apply (i) decision trees (J48), (ii) Naive Bayes, (iii) kNN (k: 1), and (iv) SVM (8M0) in Weka for classifying the BostonHousingZ data used in Problem 1. Evaluate the performances of these four classification models based on (1 ) the overall classification accuracy, and (2) the ROC curve and AUC value by considering homes with 'high' value as the positive class. The specific steps and questions for this problem are: a. Run the four classification models in Weka on the data using the default settings (10-fold cross-validation, etc.) For each model, show two output screens: the first displays the 10-fold cross-validation error rates and the confusion matrix; the second displays the ROC curve (for your reference, see the output screens shown in the \"Plotting ROC Curve in Weka" section of the lecture notes titled \"Model and Performance Evaluation"). In sum, there are eight output screens, two for each classication model. b. Based on the overall classication accuracy, rank the four models from the best to the worst. c. Suppose you are only interested in accurately predictinglidentifying high-value homes (so that the 'high' class is the positive class). In this case, how do you rank the four models from the best to the worst? Justify your answers with the relevant results from the Weka output

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related General Management Questions!