Question: Task 2 : PCA, KPCA, SVD , Classification , and Clustering on Kidney Cancer Methylation Data Dataset: You will be working with the Kidney.csv dataset,
Task : PCA, KPCA, SVD Classification and Clustering on Kidney Cancer Methylation Data
Dataset: You will be working with the Kidney.csv dataset, which contains Methylation Array data for kidney cancer. The labels indicate whether the samples are from normal tissues or cancer tissues. The Data is available in the files section.
Part : Principal Component Analysis PCA
Implement PCA from Scratch:
a Write Python code to implement PCA from scratch, including the computation of the covariance matrix, eigenvalues, and eigenvectors.
b Apply your PCA implementation to the TrainData from the split Kidney.csv to reduce the dimensionality of the methylation data.
c Choose an appropriate number of principal components to retain a significant amount of variance eg
PCA using scikitlearn:
a Import the PCA module from sklearn.
b Apply PCA to the TrainData using scikitlearn.
c Compare the results of your fromscratch implementation with the scikitlearn PCA in terms of explained variance and reduced feature sets.
Part : Kernel PCA KPCA
KPCA with RBF Kernel:
a Implement Kernel PCA with the Radial Basis Function RBF kernel from scratch.
b Apply your KPCA implementation to the TrainData.
KPCA with Polynomial Kernel:
a Implement Kernel PCA with a Polynomial kernel from scratch.
b Apply your KPCA implementation to the TrainData.
KPCA with Linear Kernel:
a Implement Kernel PCA with a Linear kernel from scratch.
b Apply your KPCA implementation to the TrainData.
Applying SVD for Dimensionality Reduction:
a Implement Singular Value Decomposition SVD for dimensionality reduction.
b Apply SVD to the TrainData and compare the results with PCA and KPCA.
Part : Testing and Evaluation
Applying PCA, KPCA, and SVD to the Test Dataset:
a Split the Kidney.csv into TrainData and TestData eg for training, for testing
b Use the PCA, KPCA with RBF Polynomial, and Linear kernels and SVD models trained on the TrainData to transform the TestData.
c Ensure that the dimensionality reduction is consistent with the training data.
Classification Experiment:
a Choose a minimum distance classifier code provided below to classify the observations in the TestData.
b Evaluate classification performance on the TestData using accuracy metrics.
c For Each Case find the best number of PCs to get high accuracy.
Visualization and Pair Plots:
a Visualize PC vs the first principal components PCs for each class normal vs cancer tissues
b Plot pair plots between PC and the first principal components both for PCA and KPCA
This assignment will help you gain practical experience with PCA and KPCA and their applications in dimensionality reduction and classification tasks.
You need to submit the file as follows firstnamelastname.ipynb
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
