Task 2 PCA, KPCA, SVD , Classification , and Clustering on Kidney Cancer Methylation Data Dataset You will be working with the Kidney csv dataset, which contains Methylation Array data for kidney cancer The labels indicate whether the samples are from normal tissues or cancer tissues The Data is available in the files section Part 1 Principal Component Analysis ( PCA ) 1 1 Implement PCA from Scratch ( a ) Write Python code to implement PCA from scratch, including the computation of the covariance matrix, eigenvalues, and eigenvectors ( b ) Apply your PCA implementation to the TrainData ( from the split Kidney csv ) to reduce the dimensionality of the methylation data ( c ) Choose an appropriate number of principal components to retain a significant amount of variance ( e g , 9 5 ) 1 2 PCA using scikit learn ( a ) Import the PCA module from sklearn ( b ) Apply PCA to the TrainData using scikit learn ( c ) Compare the results of your from scratch implementation with the scikit learn PCA in terms of explained variance and reduced feature sets Part 2 Kernel PCA ( KPCA ) 2 1 KPCA with RBF Kernel ( a ) Implement Kernel PCA with the Radial Basis Function ( RBF ) kernel from scratch ( b ) Apply your KPCA implementation to the TrainData 2 2 KPCA with Polynomial Kernel ( a ) Implement Kernel PCA with a Polynomial kernel from scratch ( b ) Apply your KPCA implementation to the TrainData 2 3 KPCA with Linear Kernel ( a ) Implement Kernel PCA with a Linear kernel from scratch ( b ) Apply your KPCA implementation to the TrainData 2 4 Applying SVD for Dimensionality Reduction ( a ) Implement Singular Value Decomposition ( SVD ) for dimensionality reduction ( b ) Apply SVD to the TrainData and compare the results with PCA and KPCA Part 3 Testing and Evaluation 3 1 Applying PCA, KPCA, and SVD to the Test Dataset ( a ) Split the Kidney csv into TrainData and TestData ( e g , 8 0 for training, 2 0 for testing ) ( b ) Use the PCA, KPCA ( with RBF , Polynomial, and Linear kernels ) , and SVD models trained on the TrainData to transform the TestData ( c ) Ensure that the dimensionality reduction is consistent with the training data 3 2 Classification Experiment ( a ) Choose a minimum distance classifier ( code provided below ) to classify the observations in the TestData ( b ) Evaluate classification performance on the TestData using accuracy metrics ( c ) For Each Case find the best number of PCs to get high accuracy 3 3 Visualization and Pair Plots ( a ) Visualize PC 1 vs the first 1 0 principal components ( PCs ) for each class ( normal vs cancer tissues ) ( b ) Plot pair plots between PC 1 and the first 1 0 principal components ( both for PCA and KPCA ) This assignment will help you gain practical experience with PCA and KPCA and their applications in dimensionality reduction and classification tasks You need to submit the file as follows ( firstname lastname ipynb )

The Answer is in the image, click to view ...

Question: Task 2 : PCA, KPCA, SVD , Classification , and Clustering on Kidney Cancer Methylation Data Dataset: You will be working with the Kidney.csv dataset,

Task

2

: PCA, KPCA, SVD

,

Classification

,

and Clustering on Kidney Cancer Methylation Data

Dataset: You will be working with the Kidney.csv dataset, which contains Methylation Array data for kidney cancer. The labels indicate whether the samples are from normal tissues or cancer tissues. The Data is available in the files section.

Part

1

: Principal Component Analysis

(

PCA

)

1.1

Implement PCA from Scratch:

(

)

Write Python code to implement PCA from scratch, including the computation of the covariance matrix, eigenvalues, and eigenvectors.

(

)

Apply your PCA implementation to the TrainData

(

from the split Kidney.csv

)

to reduce the dimensionality of the methylation data.

(

)

Choose an appropriate number of principal components to retain a significant amount of variance

(

.

., 95 %) .

1.2

PCA using scikit

-

learn:

(

)

Import the PCA module from sklearn.

(

)

Apply PCA to the TrainData using scikit

-

learn.

(

)

Compare the results of your from

-

scratch implementation with the scikit

-

learn PCA in terms of explained variance and reduced feature sets.

Part

2

: Kernel PCA

(

KPCA

)

2.1

KPCA with RBF Kernel:

(

)

Implement Kernel PCA with the Radial Basis Function

(

RBF

)

kernel from scratch.

(

)

Apply your KPCA implementation to the TrainData.

2.2

KPCA with Polynomial Kernel:

(

)

Implement Kernel PCA with a Polynomial kernel from scratch.

(

)

Apply your KPCA implementation to the TrainData.

2.3

KPCA with Linear Kernel:

(

)

Implement Kernel PCA with a Linear kernel from scratch.

(

)

Apply your KPCA implementation to the TrainData.

2.4

Applying SVD for Dimensionality Reduction:

(

)

Implement Singular Value Decomposition

(

SVD

)

for dimensionality reduction.

(

)

Apply SVD to the TrainData and compare the results with PCA and KPCA.

Part

3

: Testing and Evaluation

3.1

Applying PCA, KPCA, and SVD to the Test Dataset:

(

)

Split the Kidney.csv into TrainData and TestData

(

.

., 80 %

for training,

20 %

for testing

) .

(

)

Use the PCA, KPCA

(

with RBF

,

Polynomial, and Linear kernels

),

and SVD models trained on the TrainData to transform the TestData.

(

)

Ensure that the dimensionality reduction is consistent with the training data.

3.2

Classification Experiment:

(

)

Choose a minimum distance classifier

(

code provided below

)

to classify the observations in the TestData.

(

)

Evaluate classification performance on the TestData using accuracy metrics.

(

)

For Each Case find the best number of PCs to get high accuracy.

3.3

Visualization and Pair Plots:

(

)

Visualize PC

1

.

the first

10

principal components

(

PCs

)

for each class

(

normal vs

.

cancer tissues

) .

(

)

Plot pair plots between PC

1

and the first

10

principal components

(

both for PCA and KPCA

) .

This assignment will help you gain practical experience with PCA and KPCA and their applications in dimensionality reduction and classification tasks.

You need to submit the file as follows

(

firstname

_

lastname.ipynb

)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

MATHEMATICS FOR MACHINE LEARNING Marc Peter Deisenroth A. Aldo Faisal Cheng Soon Ong Contents Foreword 1 Part I Mathematical Foundations 9 1 Introduction and Motivation 11 1.1 Finding Words for...

1 Assignment 2 Latent Variables and Neural Networks Due Date: 21:59:59 23 May 2021 Please note that, 1. 1 sec delay will be penalized as 1 day delay. So please submit your assignment in advance...

You will work with the Thyloid.csv file, which contains gene data from each patient. The dataset includes various gene expression measurements ( features ) and a label indicating the stage...

Dataset You will work with the colon.csv file, which contains gene data from each patient. The dataset includes various gene expression measurements ( features ) and a label indicating the stage...

Part 2 : Kernel PCA ( KPCA ) 2 . 1 KPCA with RBF Kernel: ( a ) Implement Kernel PCA with the Radial Basis Function ( RBF ) kernel from scratch. ( b ) Apply your KPCA implementation to the TrainData....

don't have direct access to external files or the internet. I can provide you with the code and guidance on how to perform the tasks you mentioned using Python and common libraries. You'll need to...

PART 1 ( 2 5 points ) : You will demonstrate the use of Principal Component Analysis ( PCA ) on Breast Cancer and CIFAR - 1 0 datasets. Your demonstration will include the following tasks: Task 1 of...

* * Here focus on TASK 3 of Part 2 * * Demonstrate the use of Principal Component Analysis ( PCA ) onBreast Cancer and CIFAR - 1 0 datasets. Your demonstration will include the following tasks:Task 1...

Answer all of the following questions PleasejQuery224007261166632957794_1622504323315? Each J1 requires 4 minutes of machine time, 40 minutes of labour and 3 units of protective shell materials. Each...

Task description: PCA ( Principle Component Analysis) is = dimensionality reduction technique that projects the data into a lower dimensional space. It can be used to reduce high dimensional data...

A 30-year annual bond is offered at 5%. After that the buyer of the bond sells the bond to someone else, but in between interest rates rose to 5.5%. Why is the first buyer of the bond upset with what...

Explain the use of the mid-quarter convention for MACRS depreciation: _______________________________________________________________________________________________

Capital investment decisions are not affected by: Multiple Choice Depreciation methods. income taxes. Imentory levels. Nonfinancial conoderations

Nova is a manager who oversees several manufacturing teams. One newly formed team is struggling to coalesce and begin working toward goals because two members keep bickering, two members are jockeying