Question: The GFF3 format is a commonly-used one in bioinformatics for representing sequence annotation. You can find the specification here: http://www.sequenceontology.org/g 3.shtml Using a GFF file

The GFF3 format is a commonly-used one in bioinformatics for representing sequence annotation. You can find the specification here:

http://www.sequenceontology.org/g 3.shtml

Using a GFF file with the standard format, write the following code.

Note that this same file has both the annotation feature table and the FASTA sequence for the molecules referenced. (See the '##FASTA' directive in the specification.) store any key=value pairs relevant to that row's feature such as ID, Ontology_term or Note.

Your task is to write a GFF3 feature exporter. A user should be able to run your script like this:

$ export_gff3_feature.py --source_gff=/path/to/some.gff3 --type=gene --attribute=ID --value=YAR003W

There are 4 arguments here that correspond to values in the GFF3 columns. In this case, your script should read the path to a GFF3 file, find any gene (column 3) which has an ID=YAR003W (column 9). When it finds this, it should use the coordinates for that feature (columns 4, 5 and 7) and the FASTA sequence at the end of the document to return its FASTA sequence.

Your script should work regardless of the parameter values passed, warning the user if no features were found that matched their query. (It should also check and warn if more than one feature matches the query.)

The output should just be printed on STDOUT (no writing to a file is necessary.) It should have a header which matches their query, like this:

>gene:ID:YAR003W .... sequence here ...

Some bonus points will be awarded if you format the sequence portion of the FASTA output as 60-characters per line, which follows the standard.

#!/usr/bin/env python3

for line in open("Saccharomyces_cerevisiae_S288C.annotation.gff"): line1 = line.rstrip() if line.startswith("#"): continue column = line1.split("\t")

if len(column) != 9: continue id = column[8] type = column[2] #typeinput = input("Please enter type for search: ") #idinput = input("Please enter '=ID' number: ")

if id.find('=YAR003W') and type.find("gene"): print("Start: " + column[3] + " Stop: " + column[4] + " Strand: " + column[6]) start = column[3] stop = column[4] strand = column[6] chromosome = column[0] else: print("No features were found matching your query.")

So far I have this, but I don't know how to file back through and find the start, stop, and strand points in the FASTA section of the file.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!