Question: : ' * * Question 1 1 : * * Biostatisticians and Bioinformaticians are constantly dealing with large data sets that must be processed either
: Question : Biostatisticians and Bioinformaticians are constantly dealing with large data sets that must be
processed either for analysis or graphical visualization. This exercise is a classic example of processing the
data for visualization. In the file HWData Csv you wil find a file which has genetic variables stored
as rows and each column represents one sample collected on a patient. Some samples are labeled as Response while
others are labeled as control. Your task is the following. For each gene in the data set, you must "normalize" the
data to the control group. To do this, for each gene you must calculate the mean and standard deviation for the
control samples. Then for each value of in the data set for that gene, you must standardize it using a zscore
which simply states to subtract each value by the mean and then divide the difference by the standard deviation.
For example the first rows values should be normalized using the following
Again you must not manipulate the csv file before reading it into R For this exercise your code must also be able
to work on the test data set, HWTestSample.csv included in the assignment. The point here is that the number
of Response and Control samples could change as well as their ordering in the file itself.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
