Question: Task 2 : PCA, KPCA, SVD , Classification , and Clustering on Kidney Cancer Methylation Data Dataset: You will be working with the Kidney.csv dataset,

Task 2 : PCA, KPCA, SVD, Classification , and Clustering on Kidney Cancer Methylation Data
Dataset: You will be working with the Kidney.csv dataset, which contains Methylation Array data for kidney cancer. The labels indicate whether the samples are from normal tissues or cancer tissues. The Data is available in the files section.
Part 1: Principal Component Analysis (PCA)
1.1 Implement PCA from Scratch:
(a) Write Python code to implement PCA from scratch, including the computation of the covariance matrix, eigenvalues, and eigenvectors.
(b) Apply your PCA implementation to the TrainData (from the split Kidney.csv) to reduce the dimensionality of the methylation data.
(c) Choose an appropriate number of principal components to retain a significant amount of variance (e.g.,95%).
1.2 PCA using scikit-learn:
(a) Import the PCA module from sklearn.
(b) Apply PCA to the TrainData using scikit-learn.
(c) Compare the results of your from-scratch implementation with the scikit-learn PCA in terms of explained variance and reduced feature sets.
Part 2: Kernel PCA (KPCA)
2.1 KPCA with RBF Kernel:
(a) Implement Kernel PCA with the Radial Basis Function (RBF) kernel from scratch.
(b) Apply your KPCA implementation to the TrainData.
2.2 KPCA with Polynomial Kernel:
(a) Implement Kernel PCA with a Polynomial kernel from scratch.
(b) Apply your KPCA implementation to the TrainData.
2.3 KPCA with Linear Kernel:
(a) Implement Kernel PCA with a Linear kernel from scratch.
(b) Apply your KPCA implementation to the TrainData.
2.4 Applying SVD for Dimensionality Reduction:
(a) Implement Singular Value Decomposition (SVD) for dimensionality reduction.
(b) Apply SVD to the TrainData and compare the results with PCA and KPCA.
Part 3: Testing and Evaluation
3.1 Applying PCA, KPCA, and SVD to the Test Dataset:
(a) Split the Kidney.csv into TrainData and TestData (e.g.,80% for training, 20% for testing).
(b) Use the PCA, KPCA (with RBF, Polynomial, and Linear kernels), and SVD models trained on the TrainData to transform the TestData.
(c) Ensure that the dimensionality reduction is consistent with the training data.
3.2 Classification Experiment:
(a) Choose a minimum distance classifier (code provided below) to classify the observations in the TestData.
(b) Evaluate classification performance on the TestData using accuracy metrics.
(c) For Each Case find the best number of PCs to get high accuracy.
3.3 Visualization and Pair Plots:
(a) Visualize PC1 vs. the first 10 principal components (PCs) for each class (normal vs. cancer tissues).
(b) Plot pair plots between PC1 and the first 10 principal components (both for PCA and KPCA).
This assignment will help you gain practical experience with PCA and KPCA and their applications in dimensionality reduction and classification tasks.
You need to submit the file as follows (firstname_lastname.ipynb)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!