Question: Consider the training examples shown in the following table for binary classification. The table shows a training set for the problem of predicting whether a

 Consider the training examples shown in the following table for binary

classification. The table shows a training set for the problem of predicting

whether a loan applicant will repay his/her loan obligation or defaulting on

his/her loan. Tid Home Marital Annual Defaulted Owner Status Income Borrower 1

Consider the training examples shown in the following table for binary classification. The table shows a training set for the problem of predicting whether a loan applicant will repay his/her loan obligation or defaulting on his/her loan. Tid Home Marital Annual Defaulted Owner Status Income Borrower 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes Using the KNN approach that we discussed in the class, predict the class label for this test example, X = (Home Owner = No, Marital Status = Married, Income = $120K). Assume that k =3 and distance is the L2 norm. You are not allowed to use R or R's distance function (dist()). Use the calculator to compute distances. Use of MS Excel or spreadsheet should be acceptable as long as you do not use in-built functions. Please submit the Excel file in that case. Design L1 and L2 distance functions to assess the similarity of bank customers. The following attributes characterize each customer: Age (customer's age, which is a real number with the maximum age is 90 years and minimum age 15 years) Cr ("credit rating") is an ordinal attribute with values 'very good', 'good, 'medium', 'poor', and 'very poor'. Av_bal (avg account balance, which is a real number with mean 7000, standard deviation is 4000) 1. Using the L1 distance function computes the distance between the following 2 customers: c1 = (55, good, 7000) and c2 = (25, poor, 1000). [15 points] 2. Using the L2 distance function computes the distance between the two customers mentioned above. (15 points] Use of R or spreadsheet is not allowed. Use a calculator for calculations. The iris training and test datasets are available in the homework folder. Your task is to apply the KNN algorithm on the training set to detect Iris species of the test dataset. Specifically, develop an R script to 1. Read-in the training set and test set. 2. Apply KNN (Try multiple K values). 3. Predict the class labels for the test set. 4. Store the item IDs and class labels in a CSV file. The file format is available in the homework folder. 5. Submit the results to Kaggle. 6. Submit the R script to the blackboard. petal sepal Consider the training examples shown in the following table for binary classification. The table shows a training set for the problem of predicting whether a loan applicant will repay his/her loan obligation or defaulting on his/her loan. Tid Home Marital Annual Defaulted Owner Status Income Borrower 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes Using the KNN approach that we discussed in the class, predict the class label for this test example, X = (Home Owner = No, Marital Status = Married, Income = $120K). Assume that k =3 and distance is the L2 norm. You are not allowed to use R or R's distance function (dist()). Use the calculator to compute distances. Use of MS Excel or spreadsheet should be acceptable as long as you do not use in-built functions. Please submit the Excel file in that case. Design L1 and L2 distance functions to assess the similarity of bank customers. The following attributes characterize each customer: Age (customer's age, which is a real number with the maximum age is 90 years and minimum age 15 years) Cr ("credit rating") is an ordinal attribute with values 'very good', 'good, 'medium', 'poor', and 'very poor'. Av_bal (avg account balance, which is a real number with mean 7000, standard deviation is 4000) 1. Using the L1 distance function computes the distance between the following 2 customers: c1 = (55, good, 7000) and c2 = (25, poor, 1000). [15 points] 2. Using the L2 distance function computes the distance between the two customers mentioned above. (15 points] Use of R or spreadsheet is not allowed. Use a calculator for calculations. The iris training and test datasets are available in the homework folder. Your task is to apply the KNN algorithm on the training set to detect Iris species of the test dataset. Specifically, develop an R script to 1. Read-in the training set and test set. 2. Apply KNN (Try multiple K values). 3. Predict the class labels for the test set. 4. Store the item IDs and class labels in a CSV file. The file format is available in the homework folder. 5. Submit the results to Kaggle. 6. Submit the R script to the blackboard. petal sepal

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!