Question: The goal of this project is to write a program that will translate DNA sequences into protein sequences using the 'standard' genetic code. Remember that
The goal of this project is to write a program that will translate DNA sequences into protein sequences using the 'standard' genetic code. Remember that coding DNA is 'read' as triplet 'codons,' each of which is translated into a specific amino acid.
The genetic code is available at: http://en.wikipedia.org/wiki/DNA_codon_table (Links to an external site.)Links to an external site.
We are going to translate 3-letter DNA codons into 1-letter amino acid codes. For example, ATG gets translated to M, TGG translates to W, etc. As a standard convention, we will translate any "Stop" codons to the code, *.
As an added bonus, we will use the standard FASTA format for input and output (http://en.wikipedia.org/wiki/FASTA_format (Links to an external site.)Links to an external site.). FASTA is a common format for biological sequence data. Basically, it divides data into "identifiers" (lines starting with ">") and "sequences," which will be our DNA and amino-acid data. Note that sequences can span multiple lines, like:
>my_id
ATTGA
ACCGG
GGATC
TTA
encodes the sequence ATTGAACCGGGGATCTTA, with identifier "my_id".
Although your program will have to be able to read sequence data spanning multiple lines, please print your amino-acid sequences on a single line after each identifier.
Create an executable program called: ~/assignments/PythonChallenge/translateFasta.py
The program should take 1 command-line argument: the name of a FASTA-formatted file containing at least 1 coding DNA sequence.
The program should print the corresponding FASTA-formatted amino-acid sequences to the screen, including the unaltered identifier lines. But please print the entire amino-acid sequences on one line each.
You may want to consider using a Python dictionary, and it may be a bit tricky to figure out how to read multiple FASTA lines into a single sequence.
You should be able to 'translate' any random strings of ACTGs (provided the string length is divisible by 3), even if they are split onto multiple lines. Design your own small tests that you can work out by hand; you can always test your results against one of the online DNA-to-protein translators.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
