Question: The GFF3 format is a commonly-used one in bioinformatics for representing sequence annotation. You can find the specification here: http://www.sequenceontology.org/g 3.shtml Using a GFF file

The GFF3 format is a commonly-used one in bioinformatics for representing sequence annotation. You can find the specification here:

http://www.sequenceontology.org/g 3.shtml

Using a GFF file with the standard format, write the following code.

Note that this same file has both the annotation feature table and the FASTA sequence for the molecules referenced. (See the '##FASTA' directive in the specification.) store any key=value pairs relevant to that row's feature such as ID, Ontology_term or Note.

Your task is to write a GFF3 feature exporter. A user should be able to run your script like this:

$ export_gff3_feature.py --source_gff=/path/to/some.gff3 --type=gene --attribute=ID --value=YAR003W

There are 4 arguments here that correspond to values in the GFF3 columns. In this case, your script should read the path to a GFF3 file, find any gene (column 3) which has an ID=YAR003W (column 9). When it finds this, it should use the coordinates for that feature (columns 4, 5 and 7) and the FASTA sequence at the end of the document to return its FASTA sequence.

Your script should work regardless of the parameter values passed, warning the user if no features were found that matched their query. (It should also check and warn if more than one feature matches the query.)

The output should just be printed on STDOUT (no writing to a file is necessary.) It should have a header which matches their query, like this:

>gene:ID:YAR003W .... sequence here ...

Some bonus points will be awarded if you format the sequence portion of the FASTA output as 60-characters per line, which follows the standard.

#!/usr/bin/env python3

for line in open("Saccharomyces_cerevisiae_S288C.annotation.gff"): line1 = line.rstrip() if line.startswith("#"): continue column = line1.split("\t")

if len(column) != 9: continue id = column[8] type = column[2] #typeinput = input("Please enter type for search: ") #idinput = input("Please enter '=ID' number: ")

if id.find('=YAR003W') and type.find("gene"): print("Start: " + column[3] + " Stop: " + column[4] + " Strand: " + column[6]) start = column[3] stop = column[4] strand = column[6] chromosome = column[0] else: print("No features were found matching your query.")

So far I have this, but I don't know how to file back through and find the start, stop, and strand points in the FASTA section of the file.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

For Problem 4.4.10, use computer software and the Hazen-Williams equation instead of the DarcyWeisbach equation to solve for the flows. Problem 4.4.10 Using computer software, determine the flow rate...

The GFF3 format is a commonly-used one in bioinformatics for representing sequence annotation. You can find the specification here: http://www.sequenceontology.org/gff3.shtml The genome and...

Problem Statement The GFF3 format is a commonly-used one in bioinformatics for representing sequence annotation. You can find the specification here: The genome and annotation for Saccharomyces...

The GFF3 format is a commonly-used one in bioinformatics for representing sequence annotation. You can find the specification here: http://www.sequenceontology.org/gff3.shtml The genome and...

The GFF3 format is a commonly used one in bioinformatics for representing sequence annotation. You can find the specification here: http://www.sequenceontology.org/gff3.shtml I've placed the genome...

QUIZ... Let D be a poset and let f : D D be a monotone function. (i) Give the definition of the least pre-fixed point, fix (f), of f. Show that fix (f) is a fixed point of f. [5 marks] (ii) Show that...

Answer the question shown below. Notes for reference shown below. \fThe function displays the promptstring, waits for keyboard input, and then returns the value from the keyboard. For example, >>...

This is in C. The partial solution "template" for the assignment is provided below which is well commented. NO OTHER LANGUAGES WILL BE ACCEPTED. One technique for dealing with deadlock is called...

CSC 142 Music Player Please submit either a .zip or .jar file. You will complete this project by implementing one class. Afterwards, your program will play music from a text file. Objectives Working...

CAN SOME PLEASE HELP WITH THIS...THANKS! CSC 142 Music Player Please submit either a .zip or .jar file. You will complete this project by implementing one class. Afterwards, your program will play...

A new sports car model has defective brakes 15% of the time and a defective steering mechanism 5% of the time. Lets assume (and hope) that these problems occur independently. If one or the other of...

A CIO said that while he would not use a public network such as the Internet with an ASP for some types of ISs, he would allow employees to use the web for other types, such as an accounting...

Observation and research shows that Multiple Choice the percentage of corporations paying dividends has increased since the year 2 0 0 2 . during the year 1 9 7 9 , about two - thirds of dividends...

Compared with half a century ago, adoption has become _ _ _ _ _ _ _ _ _ common, but it is more open and acceptabl e , so we probably discuss it _ _ _ _ _ _ _ . fill in the blanks more or much less or...

The chapter notes that the rise in the U.S. trade deficit during the 1980s was due largely to the rise in the U.S. budget deficit. On the other hand, the popular press sometimes claims that the...

In 1998, the Russian government defaulted on its debt payments, leading investors worldwide to raise their preference for U.S. government bonds, which are considered very safe. What effect do you...

A case study in the chapter analyzed purchasing-power parity for several countries using the price of Big Macs. Here are data for a few more countries: Predicted Country Big Mac Rate Rate Indonesia...