R code Reads in the data file you created using Python called all VarX TwoTimePoints csv and assigns it to a data frame called var x all Reads in the data file you created using Python called all VarY TwoTimePoints csv and assigns it to a data frame called var y all Find out how many genes are in your dataset and assign the result to a variable called num genes Reads in the data file you created using Python called Leaf DEGs VarX csv and assigns it to a data frame called var x degs Reads in the data file you created using Python called Leaf DEGs VarY csv and assigns it to a data frame called var y degs 4 our data is currently in WIDE FORMAT, with a column for each variable ( in this case, each sample ) we want to have our data in LONG FORMAT, with a column for each variable type and column for the values Run the following cell r var x all long pivot longer ( var x all,cols VarXCRep 1 VarX 1 Rep 3 , names to sample , values to expression ) View ( var x all long ) Create a suitable plot to look at the distribution of expression values for all the genes as a function of the sample, for Variety X 5 Now you can repeat the above process for Variety Y Use the tidyr long format ( ) to transform your var y all data frame into a long format and call the data frame var y all long Create a suitable plot to look at the distribution of expression values for all the genes as a function of the sample, for Variety Y 6 INVESTIGATE THE DISTRIBUTION OF EXPRESSION VALUES FOR THE DEGs IN EACH SAMPLE ( Variety X ) Use the tidyr long format ( ) to transform your var x degs data frame into a long format and call the data frame var x degs long Create a suitable plot to look at the distribution of expression values for DEGs as a function of the sample, for Variety X 7 Use the tidyr long format ( ) to transform your var y degs data frame into a long format and call the data frame var y degs long Create a suitable plot to look at the distribution of expression values for DEGs as a function of the sample, for Variety Y 8 Find out how many duplicate Soltu gene names there are in the var x degs data frame and assign the result to a variable called var x dup Find out how many duplicate Soltu gene names there are in the var y degs data frame and assign the result to a variable called var y dup 9 Create a suitable plot to look at the overlap in the DEGs between the two Varieties By looking at the gene expression data in the var x degs and var y degs data frames, you can see that some genes have a positive log 2 fold change and others have a negative log 2 fold change Create a data frame called var x degs up containing only genes that are upregulated in Stress Treatment compared to control in Variety X Create a data frame called var x degs down containing only genes that are downregulated in Stress Treatment compared to control in Variety X Create a data frame called var y degs up containing only genes that are upregulated in Stress Treatment compared to control in Variety Y Create a data frame called var y degs down containing only genes that are downregulated in Stress Treatment compared to control in Variety Y Create a box plot to show the distribution of log 2 fold change for all DEGs by variety Hint the base R boxplot ( ) command and the abs ( ) function could be helpful here Create a box plot to show the distribution of log 2 fold change for upregulated DEGs by variety Hint the base R boxplot ( ) command could be helpful here Create a box plot to show the distribution of log 2 fold change for downregulated DEGs by variety Hint the base R boxplot ( ) command could be helpful here Find out the function of the bottom most upregulated gene in Variety X ( lowest fold change ) and assign the result to variable called bottom gene x Find out the function of the bottom most upregulated gene in Variety Y ( lowest fold change ) and assign the result to variable called bottom gene y Create a set of scatterplots to visually inspect how well the different replicates agree correlate for the DEGs in Variety X in the treatment time point Create a set of scatterplots to visually inspect how well the different replicates agree correlate for the DEGs in Variety X in the control time point Modify your data frame var x degs to include two new ( additional ) columns as follows The first new column should be named control mean and contain the mean expression value for the three control replicates The second new column should be named stress mean and contain the mean expression value for the three stress treatment replicates Create a data frame called var y degs up big containing only genes in Variety y that are upregulated in Stress Treatment compared to control, have at least an 2 fold absolute change in expression and have a p value less than 1 e 0 6 Hint remember you are dealing with log 2 fold change

The Answer is in the image, click to view ...

Question: R code * Reads in the data file you created using Python called ` all _ VarX _ TwoTimePoints.csv ` and assigns it to a

R code

*

Reads in the data file you created using Python called

`

all

_

VarX

_

TwoTimePoints.csv

`

and assigns it to a data frame called

`

var

_

_

all

`

*

Reads in the data file you created using Python called

`

all

_

VarY

_

TwoTimePoints.csv

`

and assigns it to a data frame called

`

var

_

_

all

`

*

Find out how many genes are in your dataset and assign the result to a variable called

`

num

_

genes

` .

*

Reads in the data file you created using Python called

`

Leaf

_

DEGs

_

VarX.csv

`

and assigns it to a data frame called

`

var

_

_

degs

`

*

Reads in the data file you created using Python called

`

Leaf

_

DEGs

_

VarY.csv

`

and assigns it to a data frame called

`

var

_

_

degs

`

4 -

our data is currently in WIDE FORMAT, with a column for each variable

(

in this case, each sample

) .

we want to have our data in LONG FORMAT, with a column for each variable type and column for the values. Run the following cell

` ` ` {

}

var

_

_

all.long

< -

pivot

_

longer

(

var

_

_

all,cols

=

VarXCRep

. 1

:VarX

1

Rep

. 3,

names

_

=

"sample", values

_

=

"expression"

)

View

(

var

_

_

all.long

) ` ` `

*

Create a suitable plot to look at the distribution of expression values for all the genes as a function of the sample, for Variety X

.

5 - *

Now you can repeat the above process for Variety Y

.

Use the

`

tidyr

`

long

_

format

()

to transform your

`

var

_

_

all

`

data frame into a long format and call the data frame

`

var

_

_

all.long

` .

*

Create a suitable plot to look at the distribution of expression values for all the genes as a function of the sample, for Variety Y

.

6 -

INVESTIGATE THE DISTRIBUTION OF EXPRESSION VALUES FOR THE DEGs IN EACH SAMPLE

(

Variety X

) . *

Use the

`

tidyr

`

long

_

format

()

to transform your

`

var

_

_

degs

`

data frame into a long format and call the data frame

`

var

_

_

degs.long

` . *

Create a suitable plot to look at the distribution of expression values for DEGs as a function of the sample, for Variety X

.

7 - *

Use the

`

tidyr

`

long

_

format

()

to transform your

`

var

_

_

degs

`

data frame into a long format and call the data frame

`

var

_

_

degs.long

` .

*

Create a suitable plot to look at the distribution of expression values for DEGs as a function of the sample, for Variety Y

.

8 - *

Find out how many duplicate Soltu gene names there are in the

`

var

_

_

degs

`

data frame and assign the result to a variable called

`

var

_

_

dup

`

*

Find out how many duplicate Soltu gene names there are in the

`

var

_

_

degs

`

data frame and assign the result to a variable called

`

var

_

_

dup

`

9 - *

Create a suitable plot to look at the overlap in the DEGs between the two Varieties.

By looking at the gene expression data in the

`

var

_

_

degs

`

and

`

var

_

_

degs

`

data frames, you can see that some genes have a positive log

2

fold change and others have a negative log

2

fold change.

*

Create a data frame called

`

var

_

_

degs.up

`

containing only genes that are upregulated in Stress Treatment compared to control in Variety X

.

*

Create a data frame called

`

var

_

_

degs.down

`

containing only genes that are downregulated in Stress Treatment compared to control in Variety X

.

*

Create a data frame called

`

var

_

_

degs.up

`

containing only genes that are upregulated in Stress Treatment compared to control in Variety Y

.

*

Create a data frame called

`

var

_

_

degs.down

`

containing only genes that are downregulated in Stress Treatment compared to control in Variety Y

.

*

Create a box plot to show the distribution of log

2

fold change for all DEGs by variety. Hint: the base R boxplot

()

command and the abs

()

function could be helpful here.

*

Create a box plot to show the distribution of log

2

fold change for upregulated DEGs by variety. Hint: the base R boxplot

()

command could be helpful here.

*

Create a box plot to show the distribution of log

2

fold change for downregulated DEGs by variety. Hint: the base R boxplot

()

command could be helpful here.

*

Find out the function of the bottom most upregulated gene in Variety X

(

lowest fold change

)

and assign the result to variable called

`

bottom

_

gene.x

` .

*

Find out the function of the bottom most upregulated gene in Variety Y

(

lowest fold change

)

and assign the result to variable called

`

bottom

_

gene.y

` .

*

Create a set of scatterplots to visually inspect how well the different replicates agree

/

correlate for the DEGs in Variety X in the treatment time point.

*

Create a set of scatterplots to visually inspect how well the different replicates agree

/

correlate for the DEGs in Variety X in the control time point.

*

Modify your data frame

`

var

_

_

degs

`

to include two new

(

additional

)

columns as follows: The first new column should be named

`

control

_

mean

`

and contain the mean expression value for the three control replicates.

*

The second new column should be named

`

stress

_

mean

`

and contain the mean expression value for the three stress treatment replicates.

*

Create a data frame called

`

var

_

_

degs.up

.

big

`

containing only genes in Variety y that are upregulated in Stress Treatment compared to control, have at least an

2

fold absolute change in expression and have a p value less than

1

- 06 . *

Hint: remember you are dealing with log

2

fold change

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

JavaScript (Simple) Framework Homework Overview: In this homework you will implement provider code (a function in an external JS file) that can produce a click sortable HTML table from any JSON...

I have to create a program in C and I can't figure it out. The program has to read a source file. Please help. /******************************************************************** PROJECT: Glossary...

1 Submission Instructions Create a folder named asuriteid-p03 where asuriteid is your ASURTE user id (for example, if your ASURITE user id is jsmith6 then your folder would be named jsmith6-p03) and...

Python and most Python libraries are free to download or use, though many users use Python through a paid service. Paid services help IT organizations manage the risks associated with the use of...

CSE205 Object Oriented Programming and Data Structures Programming Project 3: 25 pts 1 Submission Instructions Create a folder named asuriteid p03 where asuriteid is your ASURITE user id (for...

Need help getting started on these questions. I am supposed to add code where it says "implement me" and write the answer where it says answer in one or two line. Need to fill in the "Implement me"...

Development plan 1. As usual, create a directory to hold the files for this assignment. 2. Create files trace.cpp and trace.h . You can use a global variable to indicate the trace level, which can be...

Any help with this would be greatly appreciated! An HR Payroll System Computers are great at storing and manipulating data. For this project, you will write code that keeps track of Human Resources...

IKEA.txt: python problem!! no further data is given modify the below given coding: """ Furniture - models an IKEA furniture item; considered abstract, as by itself, it is meaningless as it doesn't...

Find the following probabilities for the standard normal random variable z: a. P(z > 1.46) b. P(z

Sports 'N More sends 5 checks per month. These checks average $811, $416, $6,420, $22,900, and $8,700. The three largest checks clear in 1.5 days, while the two smallest checks clear in one day. What...

Identify the attestation engagement type needed for the client in the scenario. Summarize the reasons this engagement is most appropriate for the scenario. Support with references from module...

2 Type the correct answer in the box. Write your answer as a reduced fraction, using / for the fraction bar. A six-sided fair die is rolled 4 times in a row. The probability of getting a 4 only on...