Question: Bash / Linux The gene expression data is currently split up into several files within various subdirectories of the / rds / homes / u

Bash/Linux
The gene expression data is currently split up into several files within various subdirectories of the /rds/homes/u/username/xxxx/Matrix directory.
Write a Bash script that can be run within the Matrix/directory to process the gene expression matrices to produce the following two output files:
A single csv file containing the gene expression matrix for variety X, including the header line. You should name this file GEM_X.csv (GEM stands for Gene Expression Matrix).
A single csv file containing the gene expression matrix for variety Y, including the header line. You should name this file GEM_Y.csv
In the file headers, each sample name for the columns of the matrix currently has the following format:
Variety, followed by condition code (C,1,2 or 3) and biological replicate (a,b or c). C is the control, and 1-3 are the stress conditions. For example, VarXC-a denotes Variety X, Treatment C and replicate a.
Header for VarX: gene_name,VarXC-a,VarX1-a,VarX2-a,VarXC-b,VarX3-b,VarX1-b,VarX2-b,VarXC-c,VarX3-c,VarX1-c,VarX2-c
Your gene expression matrix files should:
Contain only unique genes located on the 12 chromosomes, with genes sorted by chromosome.
Change the column labels in the header to a more convenient format, replacing a with Rep.1,-b with Rep.2 and c with Rep.3. In the above example, VarXC-a becomes VarXCRep.1

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!