Question: This python program supposed to look for Open Reading Frame( string starts with ATG and end with either TAA, TGA or TAG. It also supposed
This python program supposed to look for Open Reading Frame( string starts with ATG and end with either TAA, TGA or TAG.
It also supposed to count the length (how many characters in the string). However, it doesn't work. Why?
import re import string
with open('dna.txt', 'rb') as f: data=f.read (GAGTTTTATCGCTTCCATGACGCAGAAGTTAACACTTTCGGAATGATGAAAAA)
data = [x.split(' ', 1) for x in data.split('>')] data = [(x[0], ''.join(x[1].split())) for x in data if len(x) == 2]
start, end = [re.compile(x) for x in 'ATG TAG|TGA|TAA'.split()]
revtrans = string.maketrans("ATGC","TACG")
def get_longest(starts, ends): ''' Simple brute-force for now. Optimize later... Given a list of start locations and a list of end locations, return the longest valid string. Returns tuple (length, start position)
Assume starts and ends are sorted correctly from beginning to end of string. ''' results = {} # Use smallest end that is bigger than each start ends.reverse() for start in starts: for end in ends: if end > start and (end - start) % 3 == 0: results[start] = end + 3 results = [(end - start, start) for start, end in results.iteritems()] return max(results) if results else (0, 0)
def get_orfs(dna): ''' Returns length, header, forward/reverse indication, and longest match (corrected if reversed)
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
