Question: I have a fasta file containing DNA sequences. I've already cleaned the file and have only DNA sequences with no gaps or errors or white
I have a fasta file containing DNA sequences. I've already cleaned the file and have only DNA sequences with no gaps or errors or white space characters. I want to find open reading frames in the fasta file that are longer than 30 bp.
Here's a little description: An open reading frame is defined by a sequence that
start with ATG,
ends with TAG, TAA, or TGA,
is a multiple of three.
For each open reading frame, I want to print the start position in the sequence, the length of the open reading frame, and the open reading frame sequence to the screen. Then I want to print the total number of open reading frames found. I would appreciate some help.
No libraries and python only.
Also, here's the DNA sequence I am working with:
GCATGCAATACAGTGACATATATATATACCCTAACACTACCCTAACCCTACCCTATTTCA ACCCTTCCAACCTGTCTCTCAACTTACCCTCACATTACCCTACCTCTCCACTTGTTACCC TGTCCCATTCAACCATACCACTCCCAACCACCATCCATCCCTCTACTTACTACCACCAAT CAACCGTCCACCATAACCGTTACCCTCCAATTAGCCATATTCAACTTCACTACCACTTAC CCTGCCATTACTCTACCATCCACCATCTGCTACTCACCATACTGTTGTTCTACCCTCCAT ATTGAAACGTTAACAAATGATCGTAAATAATACACATATACTTACCCTACCACTTTATAC CACCACACATCACATGCCATACTCACCTTCACTTGTATACTGATATGCCATACGCACACG GATGCTACAGTATATACCACTCTCAAACTTACCCTACTCTCACATTCTACTCCACTCCAT GACCCATCTCTCACTAAATCAGTACTAAATGCACCCACATCATTATGCACGGCACTTGCC TCAGCGGTCTATACCCTGAGCCATTTACCCATAACTCCCACGATTATCCACATTTTAATA TCTATATCTCATTCGGCGGGCCCAAATATTGTATAACTGCTCTTAATACATACGTTATAC CACTTTTGCACCATATACTAACCACTCAAATTATATACACTTATGCCAATATAACCAAAA AATTACCACTAAAATCACCTAAACATAAAAATTATTTATCTTTCAACTTTACGAAATAAA CACACTCAATTGCGTATCTATACCACCATGACGTCATTAACGTAAAAGTTCCTTAATATT ACCATTTGCTTGAACGGATACCATTTCAGAATATTTCTAACTTTCACAGACCATACATTA GAATAATATGCCACCTCACTGTCGTAACACTCTATATTCACCGAGAAACAATACGGTAGT GGCTCAAACTCATGCCGGTGCTATGATACAATTGTATCCTATTTCCATTCTCATATGCTA TCCGCAATATCCTAAAAGCATAACTGATGCATCTTTAATCTTGTATGTGACACTACTCAT ACGAAGAGACTATATCTAAAGAAGACGATACAGTGATATGTACGTTGTTTTTGTAGAATT ATAATGAAACGTCAAATAACCCTACTATATTATAACTTATCAGCGCCGTATACTAAAACG GACGTTACGATATTGTCTCACTTCATCTTACCACCCTCTATCTTATTGTTGATAAAACAC TAACCCCTCAGCTTTATTTCTAGTTACAGTTACAACAAACTATCCCAAACCATAAATCTT AATATTTTAGGTGTCAAAAAATGAGGATCTCCAAATGAGAGTTTGGTACCATGACTTGTA ACTCCACTACCCTGATCTGCAATCTTGTTCTTAGAAGTGACGCATACTCTATATGGCCCG ACGCGACGCGCCAAAAAATGAAAAAAGAAGCAGCGACTCATTTTTATGGAAGGACAAAGT GCTGCGAAGTCATACGCTTCCAATTTCATTATTGTTTATTGGACATACTCTGTTAGCTTT ATTACCGTCCACGCTTTTTCTACAATAGTGTAAAAGTTTCTTTCTTATGTTCATCGTATT CATAAAATGCTTCACGAACACCGTCATTGATCAAATAGGTTTATAATATTAATATACATT TATATAATCGGCGGTATTTATATCATCAAAAAAAGTAGTTTTTTATTTTATTTTTTCATT ACTTTTCACTGTCTATGGATTTTCATTCGTAAAGGCATCACTCCCTAGTTTGCGATAGTG TAGATACCGTCCTTGGATAGAGCACTGGAGATGGCTGGCTTTAATCTGCTGGAGTACCAT GGAACACCGGTGATCATTCTGGTCACTTGGTCTGGGGCAATACCAGTCAACATGGTGGTG AAGTCACCGTAGTTGAAAACGGCTTCAGCAACTTCGACTGGGTAGGTTTCAGTTGGGTGG GCGGCTTGGAACATGTAGTATTGGGCCAAGTGAGCTCTGATATCAGAGACGTAGACACCC AATTCCACCAAGTTGACTCTTTCGTCAGATTGAGCTAGAGTGGTGGTTGCAGAAAGCAGT AGCAGCGATGGCAGCGACACCAGCAGCGATTGAAGTTAATTTGACCATTGTATTTGTTTT GTTTGTTAGTGCTGATGTAATCTTAACAAGAAATAGTGAAATGAAAGCGCATACCTCAAA GGCATATAGTTGAAGCAGCTCTATTTATACCCGTTCCTCCATCTGTCATCACTACTTAAA CGATTCGTTAACAGACGCTCATTTAGCACCTCACATATTCTCCATATCTCATCTTTCACA CAATCTCATTATCTCTATGGAGATGCTCTTGTTTCTGAACGAATCATACATCTTTCATAG GTTTCGTATGTGGAGTATTGTTTTATGGCACTCATGTGTATTCGTATGCGCAGAATGTGG GAATGCCTATTATAGGGGTGCCGGGGGGTGCCTTGAAAAACCCTTTTGCGCGCCTGTTAA GTTTCCGTTTTCAGTCAAAAAGAATATCCGAATTTTAGATTTGGACCCTCGTTCAGAAGC TTACTGTCTAAGCCATCATTTGGTGTGTCCTAAACGGTTTCCATGCAAAGCAACATCTTT GTTACTCATTCCTGAAGGTTGAATAAAAAATAGCAACCATGTGCAGGGAGTCGTATATTG TTAGATTCTAGAGACTTGTACGCATCATCAAAGCTGTAAATAGAATAAACATACGCAAGG CGTCAAAAGTGCATAGTTAAGAAAATTCCTGACATGTGAAAATATGTGTTTATGAAATGT GTCAAGGCCCGTCTATAGCGTAGTTAACCCCTCTGCAGGAGTAAGTGACTTTTTTTACGC TCAAAAGGCAACGAGGGCACATACTTAAAAGTCATTTTCAAACACATCTGCAGTTTGCAA CGACAGATAACAATATTATGATAGGATGGTATGATGTTATTAGCTGCCACATATTTTTAA TAATAGCATTAGTCACGTCTCTTCAATTGTTGGGATGAAACTCTAAAATATCATTCCTTT AGTAGTATTCCAGTTACCAGTATATTATCACATGCCGAAAAAGAAGATGACATAAAGATC GACAAACAGTCTTCAAATATAATGGAAGCTGGAATGCAAGGATTGATAATGTAACAGGAT ACTGAATGACAAAGTATAAATGAAAAAAAAAAAAAAAAAAAAGTAGTAATACTATTATGT GGAAATACCGATTCCATTTTGAGGATTCCAATTGTTGGAATAAAAATCAACTATCATCTA CTAACTAGTATTTACGTTACTAGTATATTATCATATACGGTGTTAGAAGATGACGCAAAT GATGAGAAATAGTCATCTAAATTAGTGGAAGCTGAAACGCAAGGATTGATAATGTAATAG GATCAATGAATATTAACATATAAAACGATGATAATAATATTTATAGAATTGTGTAGAATT GCAGATTCCCTTTTATGGATTCCTAAATCCTGGAGGAGAACTTCTAGTATATCTACATAC CTAATATTATAGCCTTAATCACAATGGAATCCCAACAATTACATCAAAATCCACATTCTC TACACCAATACCATCGACGAGAGCTTCTAGTAAATTGTATACATAACAGTATAACCCTTA CCAACAATGGAATCTCAAAGATTATTAAATTATTCACAGACTCTGAGGATTCGGGTAAAA TAGGGTATTTAACTGGTTACCGGAAAGGTTTAGAAAATTCGTGGAGGGTTGGCCGAGTGG TCTAAGGCGGCAGACTTAAGATCTGTTGGACGGTTGTCCGCGCGAGTTCGAACCTCGCAT TCATTTTCCGGATCCACTTTATAGTGGGTCACTTCATCTCTTGTAGGGTTCACTGAAAGA TAATCAATCAAATTATCATATGACCAATGGGTCACTTTTTTGATGAGACCCATGTTTATT CTTCTATACGTTCGTGATACTGCACTTACGACTACCAGTAACATCAGAATAACCATAGAG CTTATGCTATCTTGAAATATATCATCGCCGTAAAGTATGATGTGTAGCGCATTAGTCTTC TGCATTCATATTATCAGTGGTATACCGTTTGGTATATGCTTTCCAATCATGAAGTGTTGG CTCTACTCTATATAAATTCGACATCCATTGATTTTCTCAATAAATGAGTTACCCAAGTAA GCTTTCCATTGATACAAGTGATCTACATTCTTGCGACGCCAAATTATAAAGCACTAAAAA TCATATCATACCCAATTGCGGGCAGCACATGTATATATTATACACTACTTATAACTAACC TTCTTCATATATAACAATATGCCTGATAATACTAATGCATTTAAATCATATCATGAAAGA AAATTACATGGGTTTTATTGACATAATTGCATTTAGAATACATAAAATTCTAAAGAATTA ATATATCCAAAGTATTAGACATAACCAAGAATAATAGTGAATAATTTTAGATTTTGTTAC ATATAATTCTGCTTGCCTATCTCTTCCACTCTTTTCAAAACGTTGCATGTAAGCGTTACT AATATTCCGCTTTATTTTGTTGCAATTCCTAATTTTTTCATTACATTATCTTGCGAGTAC GGAAGCGATTAACGTTCTCCCAATAGAAGGAACAAACATAGATATTGAAGTTTTACTGCT TTTGCTTACCTGACCTTTTTCAAATTTAATTTTTTCCCGCTAATAAGACCATAAACTACC CCGAACCAAATTCTAAAAGATAGTCAGCTGGATTAGAGTTGTCATCTCCAAACATTAATT TTGCATTATCTTCGGCTTCAATCAAATCGCCTGATAAGAACTCCTTTAATTCTTCATGAA TGTTTGTATGTGGATGACTCTCCATAGTGCCAGCATGATTGTGGTTACCGACCGAATCAT ATAACGGTGGCTCCCAATTGTGCAATTCAGCCTTACTATTTTGTTCATTCATCAAAGCAT TTGGGACAGATCTAATATCTATAATTCTTTCCTCACTATTCTCGCTATTATTTTGCCCCG AACTGGCATGGTGGTTATTGGTAAAAGGAGATAATGCTGCGACAGAACTTTTCTTCTCTT CCAATTCTTTTATCTTCGCCAATAACGCCTTCTTTTTTCGTGCTTGTATTTCTAAAATTT CGGCCAGGTATTGTAAATATTCGACCGTTCTATCCAAAATGATACCCTTATTTGGTTTGA TTTGTTTACCTAGGTCATCGTAATTCAATAAAGATGGTGGAACCAACTGGCCGAGTTCTT TTATCTTTTGCTTTATTAATTCTCTTCTTCTCCTTTCGACGGCATTATGAAACTCTCTTT TGCGCCTCAGTTTCTCATCAGAAGTTAACCCGCCTAAAATCTTTGGAACACTTCCAGGTC CTATATTTTCAGTCATATTGCTACTTATTGAAGTGTGTCTTGTTCTGGGTGTGTTTATGC TACCATGCCTAAAAGAAGATGAAAGGAAACTTCCTGCACGAAATGACGATGATGGTGATC TTACCTTTGGGGAATATGTGGAAGATACGGATGCTGGGCCCAAGCTTTGTGGATTATAAG AAAATGATGATGAATATGTGTTTGGTGTCATCATATCAGAATTGATGCTAGAAGATAAAG AGGAGCTCAAATCATCAGTTAAGTTATACATTGTGTCATCGGTGCCATGCTGAAATAAAA AGTCATGTGGTAACTCAGCTTTTAAGCCTTCTTGTTGCTGTAATGGAGTATTCATTGCTC
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
