Question: Test the KNN algorithm for predicting the cancer type on the cancer dataset. Two files will be provided to you. One contains the gene expression

Test the KNN algorithm for predicting the cancer type on the cancer dataset. Two files will be provided to you. One contains the gene expression data for 20531 genes for 801 patients. The other 4 file contains the class labels for the five types of cancer. Here is the code to read the data and convert the five categories of cancer into numeric values. The datafiles are available as a zip file on the course web site.

datafile = "D:/PythonAM2/Data/TCGA-PANCAN-HiSeq-801x20531/data.csv"

labels_file = "D:/PythonAM2/Data/TCGA-PANCAN-HiSeq-801x20531/labels.csv"

data = np.genfromtxt(datafile,delimiter=",",usecols=range(1, 20532), skip_header=1 )

true_label_names = np.genfromtxt( labels_file, delimiter=",", usecols=(1,), skip_header=1, dtype="str" ) print(data.shape)

print(true_label_names[:5])

# The data variable contains all the gene expression values # from 20,531 genes. The true_label_names are the cancer # types for each of the 801 samples.

# BRCA: Breast invasive carcinoma # COAD: Colon adenocarcinoma

# KIRC: Kidney renal clear cell carcinoma

# LUAD: Lung adenocarcinoma

# PRAD: Prostate adenocarcinoma Use 80% of the data for training and report the accuracy of detecting the correct cancer type using the KNN algorithm on the remaining 20% of the data.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!