Question: R Code only 2 variety plants ( VarX & VarY ) , each with 2 conditions ( stress and control ) var _ x _

R Code only
2 variety plants (VarX & VarY), each with 2 conditions (stress and control)
var_x_all <- read_csv('all_VarX_TwoTimePoints.csv')
Header: gene_name,VarXCRep.1,VarX1Rep.1,VarX2Rep.1, VarXCRep.2,VarX3Rep.2, VarX1Rep.2, VarX2Rep.2,VarXCRep.3,VarX3Rep.3,VarX1Rep.3,VarX2Rep.3
var_y_all <- read_csv('all_VarY_TwoTimePoints.csv')
Each variety has differentially expressed genes (DEGs)
var_x_degs <- read_csv('Leaf_DEGs_VarX.csv')
Header: gene_name, log2FoldChange, padj, Athaliana_gene_nameID, Gene_Function, VarXCRep.1, VarXCRep.2, VarXCRep.3, VarX1Rep.1, VarX1Rep.2, VarX1Rep.3
var_y_degs <- read_csv('Leaf_DEGs_VarY.csv')
#INVESTIGATE THE DISTRIBUTION OF EXPRESSION VALUES FOR ALL GENES IN EACH SAMPLE (Variety X).
var_x_all.long <- pivot_longer(var_x_all, cols=VarXCRep.1:VarX1Rep.3,names_to = "sample", values_to = "expression")
var_x_plot <- ggplot(var_x_all.long, aes(x = sample, y = expression))+ geom_boxplot()+
DO SAME for VarY
#INVESTIGATE THE DISTRIBUTION OF EXPRESSION VALUES FOR THE DEGs IN EACH SAMPLE (Variety X).
var_x_degs.long <- pivot_longer(var_x_degs,cols=VarXCRep.1:VarX1Rep.3,names_to = "sample", values_to = "expression"))
var_x_degs_plot <- ggplot(var_x_degs.long, aes(x = sample, y = expression))+
geom_boxplot()
DO SAME for VarY
#HOW MANY DIFFERENTIALLY EXPRESSED GENES ARE THERE IN EACH VARIETY?
var_x_dup <- CODE HERE
var_y_dup <- CODE HERE
#INVESTIGATE IF THE SAME OR DIFFERENT GENES ARE DIFFERENTIALLY EXPRESSED IN THE TWO VARIETIES. Create a suitable plot to look at the overlap in the DEGs between the two Varieties. CODE HERE
#SEPARATE OUT THE UP- AND DOWN- REGULATED DEGs (BETWEEN STRESS AND CONTROL CONDITION).
By looking at `var_x_degs` and `var_y_degs` data frames, you can see that some genes have a positive log 2 fold change and others have a negative log 2 fold change. * Create a data frame called `var_x_degs.up` containing only genes that are upregulated in Stress Treatment compared to control in Variety X. CODE HERE
* Create a data frame called `var_x_degs.down` containing only genes that are downregulated in Stress Treatment compared to control in Variety X.
* Same for VarY. CODE HERE
#INVESTIGATE THE FOLD CHANGE IN GENE EXPRESSION FOR THE DEGs, BETWEEN STRESS AND CONTROL CONDITION.* Create a box plot to show the distribution of log2 fold change for all DEGs by variety. Hint: the base R boxplot() command and the abs() function could be helpful here.
* Create a box plot to show the distribution of log2 fold change for upregulated DEGs by variety. Hint: the base R boxplot() command could be helpful here.
* Create a box plot to show the distribution of log2 fold change for downregulated DEGs by variety. Hint: the base R boxplot() command could be helpful here.
#INVESTIGATE THE FUNCTIONS OF THE DIFFERENTIALLY EXPRESSED (UPREGULATED) GENES WITH THE LOWEST FOLD CHANGE * Find out the function of the bottom most upregulated gene in Variety X (lowest fold change) and assign the result to variable called `bottom_gene.x`.
* Find out the function of the bottom most upregulated gene in Variety Y (lowest fold change) and assign the result to variable called `bottom_gene.y`.
#INVESTIGATE THE BEHAVIOUR OF THE BIOLOGICAL REPLICATES FOR THE DEGs in Variety X IN THE TREATMENT TIME POINT.* Create a set of scatterplots to visually inspect how well the different replicates agree/correlate for the DEGs in Variety X in the treatment time point.
#INVESTIGATE THE BEHAVIOUR OF THE BIOLOGICAL REPLICATES FOR THE DEGs in Variety X IN THE CONTROL TIME POINT.* Create a set of scatterplots to visually inspect how well the different replicates agree/correlate for the DEGs in Variety X in the control time point.
#COMPARE THE MEAN EXPRESSION IN TREATMENT VERSUS CONTROL REPLICATES FOR EACH DEG. * Modify your data frame `var_x_degs` to include two new (additional) columns as follows:* The first new column should be named `control_mean` and contain the mean expression value for the three control replicates.
* The second new column should be named `stress_mean` and contain the mean expression value for the three stress treatment replicates.
#PRIORITISE GENES OF INTEREST FOR FURTHER INVESTIGATION.* Create a data frame called `var_y_degs.up.big` containing only genes in Variety y that are upregulated in Stress Treatment compared to control, have at least an 2 fold absolute change in expression and have a p value less than 1e-06.*Hint: remember you are dealing with log 2 fold change.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!