Question: R code * Reads in the data file you created using Python called ` all _ VarX _ TwoTimePoints.csv ` and assigns it to a
R code
Reads in the data file you created using Python called allVarXTwoTimePoints.csv and assigns it to a data frame called varxall
Reads in the data file you created using Python called allVarYTwoTimePoints.csv and assigns it to a data frame called varyall
Find out how many genes are in your dataset and assign the result to a variable called numgenes
Reads in the data file you created using Python called LeafDEGsVarX.csv and assigns it to a data frame called varxdegs
Reads in the data file you created using Python called LeafDEGsVarY.csv and assigns it to a data frame called varydegs
our data is currently in WIDE FORMAT, with a column for each variable in this case, each sample we want to have our data in LONG FORMAT, with a column for each variable type and column for the values. Run the following cell r varxall.long pivotlongervarxall,colsVarXCRep:VarXRepnamesto "sample", valuesto "expression" Viewvarxall.long
Create a suitable plot to look at the distribution of expression values for all the genes as a function of the sample, for Variety X
Now you can repeat the above process for Variety Y Use the tidyr longformat to transform your varyall data frame into a long format and call the data frame varyall.long
Create a suitable plot to look at the distribution of expression values for all the genes as a function of the sample, for Variety Y
INVESTIGATE THE DISTRIBUTION OF EXPRESSION VALUES FOR THE DEGs IN EACH SAMPLE Variety X Use the tidyr longformat to transform your varxdegs data frame into a long format and call the data frame varxdegs.long Create a suitable plot to look at the distribution of expression values for DEGs as a function of the sample, for Variety X
Use the tidyr longformat to transform your varydegs data frame into a long format and call the data frame varydegs.long
Create a suitable plot to look at the distribution of expression values for DEGs as a function of the sample, for Variety Y
Find out how many duplicate Soltu gene names there are in the varxdegs data frame and assign the result to a variable called varxdup
Find out how many duplicate Soltu gene names there are in the varydegs data frame and assign the result to a variable called varydup
Create a suitable plot to look at the overlap in the DEGs between the two Varieties.
By looking at the gene expression data in the varxdegs and varydegs data frames, you can see that some genes have a positive log fold change and others have a negative log fold change. Create a data frame called varxdegs.up containing only genes that are upregulated in Stress Treatment compared to control in Variety X
Create a data frame called varxdegs.down containing only genes that are downregulated in Stress Treatment compared to control in Variety X
Create a data frame called varydegs.up containing only genes that are upregulated in Stress Treatment compared to control in Variety Y
Create a data frame called varydegs.down containing only genes that are downregulated in Stress Treatment compared to control in Variety Y
Create a box plot to show the distribution of log fold change for all DEGs by variety. Hint: the base R boxplot command and the abs function could be helpful here.
Create a box plot to show the distribution of log fold change for upregulated DEGs by variety. Hint: the base R boxplot command could be helpful here.
Create a box plot to show the distribution of log fold change for downregulated DEGs by variety. Hint: the base R boxplot command could be helpful here.
Find out the function of the bottom most upregulated gene in Variety X lowest fold change and assign the result to variable called bottomgene.x
Find out the function of the bottom most upregulated gene in Variety Y lowest fold change and assign the result to variable called bottomgene.y
Create a set of scatterplots to visually inspect how well the different replicates agreecorrelate for the DEGs in Variety X in the treatment time point.
Create a set of scatterplots to visually inspect how well the different replicates agreecorrelate for the DEGs in Variety X in the control time point.
Modify your data frame varxdegs to include two new additional columns as follows: The first new column should be named controlmean and contain the mean expression value for the three control replicates.
The second new column should be named stressmean and contain the mean expression value for the three stress treatment replicates.
Create a data frame called varydegs.upbig containing only genes in Variety y that are upregulated in Stress Treatment compared to control, have at least an fold absolute change in expression and have a p value less than eHint: remember you are dealing with log fold change
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
