Question: R code * Reads in the data file you created using Python called ` all _ VarX _ TwoTimePoints.csv ` and assigns it to a

R code
* Reads in the data file you created using Python called `all_VarX_TwoTimePoints.csv` and assigns it to a data frame called `var_x_all`
* Reads in the data file you created using Python called `all_VarY_TwoTimePoints.csv` and assigns it to a data frame called `var_y_all`
* Find out how many genes are in your dataset and assign the result to a variable called `num_genes`.
* Reads in the data file you created using Python called `Leaf_DEGs_VarX.csv` and assigns it to a data frame called `var_x_degs`
* Reads in the data file you created using Python called `Leaf_DEGs_VarY.csv` and assigns it to a data frame called `var_y_degs`
4- our data is currently in WIDE FORMAT, with a column for each variable (in this case, each sample). we want to have our data in LONG FORMAT, with a column for each variable type and column for the values. Run the following cell ```{r} var_x_all.long <- pivot_longer(var_x_all,cols=VarXCRep.1:VarX1Rep.3,names_to = "sample", values_to = "expression") View(var_x_all.long)```
* Create a suitable plot to look at the distribution of expression values for all the genes as a function of the sample, for Variety X.
5-* Now you can repeat the above process for Variety Y. Use the `tidyr` long_format() to transform your `var_y_all` data frame into a long format and call the data frame `var_y_all.long`.
* Create a suitable plot to look at the distribution of expression values for all the genes as a function of the sample, for Variety Y.
6- INVESTIGATE THE DISTRIBUTION OF EXPRESSION VALUES FOR THE DEGs IN EACH SAMPLE (Variety X).* Use the `tidyr` long_format() to transform your `var_x_degs` data frame into a long format and call the data frame `var_x_degs.long`.* Create a suitable plot to look at the distribution of expression values for DEGs as a function of the sample, for Variety X.
7-* Use the `tidyr` long_format() to transform your `var_y_degs` data frame into a long format and call the data frame `var_y_degs.long`.
* Create a suitable plot to look at the distribution of expression values for DEGs as a function of the sample, for Variety Y.
8-* Find out how many duplicate Soltu gene names there are in the `var_x_degs` data frame and assign the result to a variable called `var_x_dup`
* Find out how many duplicate Soltu gene names there are in the `var_y_degs` data frame and assign the result to a variable called `var_y_dup`
9-* Create a suitable plot to look at the overlap in the DEGs between the two Varieties.
By looking at the gene expression data in the `var_x_degs` and `var_y_degs` data frames, you can see that some genes have a positive log 2 fold change and others have a negative log 2 fold change. * Create a data frame called `var_x_degs.up` containing only genes that are upregulated in Stress Treatment compared to control in Variety X.
* Create a data frame called `var_x_degs.down` containing only genes that are downregulated in Stress Treatment compared to control in Variety X.
* Create a data frame called `var_y_degs.up` containing only genes that are upregulated in Stress Treatment compared to control in Variety Y.
* Create a data frame called `var_y_degs.down` containing only genes that are downregulated in Stress Treatment compared to control in Variety Y.
* Create a box plot to show the distribution of log2 fold change for all DEGs by variety. Hint: the base R boxplot() command and the abs() function could be helpful here.
* Create a box plot to show the distribution of log2 fold change for upregulated DEGs by variety. Hint: the base R boxplot() command could be helpful here.
* Create a box plot to show the distribution of log2 fold change for downregulated DEGs by variety. Hint: the base R boxplot() command could be helpful here.
* Find out the function of the bottom most upregulated gene in Variety X (lowest fold change) and assign the result to variable called `bottom_gene.x`.
* Find out the function of the bottom most upregulated gene in Variety Y (lowest fold change) and assign the result to variable called `bottom_gene.y`.
* Create a set of scatterplots to visually inspect how well the different replicates agree/correlate for the DEGs in Variety X in the treatment time point.
* Create a set of scatterplots to visually inspect how well the different replicates agree/correlate for the DEGs in Variety X in the control time point.
* Modify your data frame `var_x_degs` to include two new (additional) columns as follows: The first new column should be named `control_mean` and contain the mean expression value for the three control replicates.
* The second new column should be named `stress_mean` and contain the mean expression value for the three stress treatment replicates.
* Create a data frame called `var_y_degs.up.big` containing only genes in Variety y that are upregulated in Stress Treatment compared to control, have at least an 2 fold absolute change in expression and have a p value less than 1e-06.*Hint: remember you are dealing with log 2 fold change

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!