Question: Bash / Linux The gene expression data is currently split up into several files within various subdirectories of the / rds / homes / u
BashLinux
The gene expression data is currently split up into several files within various subdirectories of the rdshomesuusernamexxxxMatrix directory.
Write a Bash script that can be run within the Matrixdirectory to process the gene expression matrices to produce the following two output files:
A single csv file containing the gene expression matrix for variety X including the header line. You should name this file GEMXcsv GEM stands for Gene Expression Matrix
A single csv file containing the gene expression matrix for variety Y including the header line. You should name this file GEMYcsv
In the file headers, each sample name for the columns of the matrix currently has the following format:
Variety, followed by condition code C or and biological replicate ab or c C is the control, and are the stress conditions. For example, VarXCa denotes Variety X Treatment C and replicate a
Header for VarX: genename,VarXCaVarXaVarXaVarXCbVarXbVarXbVarXbVarXCcVarXcVarXcVarXc
Your gene expression matrix files should:
Contain only unique genes located on the chromosomes, with genes sorted by chromosome.
Change the column labels in the header to a more convenient format, replacing a with Repb with Rep and c with Rep In the above example, VarXCa becomes VarXCRep
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
