Question: Supervised learning (Classification) and Cross validation using R This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The datasets

Supervised learning (Classification) and Cross validation using R

This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The datasets objective is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset. Here, we will use k-nearest neighbours as the classification method, but you may work through this section using any classifier you wish. We have provided code for the initial processing of the data.

library(tidyverse) library(class) library(cvTools) library(randomForest) pima = read_csv("data/pima.csv") glimpse(pima) pima_scaled = pima %>% mutate(y = factor(y)) %>% mutate_if(is.numeric, .funs = scale) X = pima_scaled %>% select(-y) %>% scale() y = pima_scaled %>% select(y) %>% pull() n = length(y)

1.1

(a) Perform k-nearest neighbours on the data with k=5 and calculate an independent test set accuracy.

(b) Write a function to calculate estimated accuracy for a vector of the predicted labels (the input is a vector of the true labels and a vector of the predicted labels).

1.2

Perform k-nearest neighbours on the data with k=5 with your own CV code. You are encouraged to write your own code.

1.3

What happens if we repeat the calculation of the CV 50 times? In practice, we dont expect the same CV estimation, but how different are these CV estimations? To answer this question, put an additional for loop around your CV loop from 1.2 in order to repeat the CV procedure 50 times and visualize your results.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

R Project 0 2 : Data Frame / Data Preparation / Data Mining Please download the R file R Project 0 2 . R . Please download the data files: Pokeman Data.csv , Pokemon Info.csv , and Diabetes Data.csv...

Problem Context This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to diagnostically predict whether or not a...

This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to diagnostically predict whether or not a patient has diabetes,...

Question 1 options: INSTRUCTIONS FOR QUESTION 1 The following questions require you to: Download a dataset Read the data into R using R coding Analyse the data as instructed in the questions below...

Part 2: Exploratory analysis and classification (25%) Background You are required to engage in exploratory analysis and classification in relation to the datasetDiabetes (CSV 24 KB)...

1. What cultural challenges are posed by Disneys expansion into Asia? How are these different from those in Europe? 2. How do cultural variables influence the location choice of theme parks around...

Effect of Helium-Neon Laser Auriculotherapy on Experimental Pain Threshold is the title of an article in the journal Physical Therapy. In this article, laser therapy is discussed as a useful...

Kermit calculated his total asset turnover to be 1 . 1 3 . This tells Kermit that every a . year he turns his inventory 1 . 1 3 times. b . dollar of assets generates $ 1 . 1 3 in profits. c . dollar...

Consider the following transactions for Huskies Insurance Company; Income taxes for the year total $ 5 3 , 0 0 0 but won't be paid until next April 1 5 , On June 3 0 , the company lent its chief...

1. What stage of career development are you in? What career concerns are most important to you? Are these concerns consistent with any one of the development models presented in the chapter?

3. Explain the development tasks and activities in the career development process.

5. Discuss the role of the Web in career management.