Question: Having trouble with DNA java code please help So heres the assignment which I'm having trouble with: So overall its: Make a code that takes
Having trouble with DNA java code please help
So heres the assignment which I'm having trouble with:


So overall its:
Make a code that takes a inputted DNA file and takes the all the info from that text file and stores the dna sequence in arrays which in the end gets printed out using printstream in one method
The dna.txt(for input file):
cure for cancer protein
ATGCCACTATGGTAG
captain picard hair growth protein
ATgCCAACATGgATGCCcGATAtGGATTgA
bogus protein
CCATtAATgATCaCAGTt
michael jordan mad hops protein
ATgAGATCCgtgatGTGggaTCCTaCTCATTaa
paris hilton phony protein
AtgCCaacaTGGATGCCCTAAGATAtgGATTagtgA
george w bush approval rating protein
atgataattagttttaatatcagactgtaa
jimi hendrix guitar talent protein
ATGCAATTGCTCGATTAG
tyler durden's brain protein
ATGATAcctatgagtaaTGTGGACCatatccaaACTATAGGCATtgtcggACCAACGATcgattggtTATACTG
A
mini me growth hormone
AtGgGaCGCTgA
the ecoli.txt(for input file):
thr operon leader peptide
ATGAAACGCATTAGCaCCAcCATtACCACCaCCATCaCcATTACCACAGGTAACGGTGCGGGCTGA
aspartokinase I/homoserine dehydrogenase I
ATGCGAGtGTTGAAGTTcgGCGGTaCATCAgTGGCAAATGCAGAACGTtTTCTGCGGgTTGCCGATAttCTGGA
AAGcAATGCCAGGCAGGGGCAGgTGGcCACCGTCCTCtCTGcCCCCGCCAAAATCACCAACCATCtGGTaGCGA
TGATtGaaAAaACCATtAGCGGTCAGGAtGCtTTaCcCaATATCAGCGATGCCGAACGTATTTTTGCCGAACTt
CTGACgGGACTCGCCGCcGCCCAGcCGGGATTTCCGCTGGCACAAtTgAAAAcTTTCGTCGACCAgGAATTTGC
CCAAATAAAACATGTcCtGCATGGCatCAGTTTGTTGGGGCAGTGCCCGGaTAGCATcAACGCTGCGCTGATTT
GcCGTGgCGAGAAAaTGTcGaTcgCCattaTGGCCGGCGTGTTAGAAGCGCGTGGTCACAACGTTACCGTTATC
GATCCGgTCGAAaAAcTGCTgGCAGTGGGTCATTAcCtCgAaTCTACCGTTGATaTtGCTGAATCCACCCGCCG
TATTGCGGCAAGCCGCATTCCgGCTGACCACATgGtGCTGATGGCTGGTTTCACTGcCggTAATGAAAAAGgCG
aGCTGGtGGTtCTGGGAcGCAACGGTTCCGACTaCTCCGCTGCGGTgCTGGCGGCcTGTTTaCGCGCCGATTGT
TGcGAgaTCTGGACGGATGTTGAcGGTGTTTATACCTGCGATCCGCGTCAGGTGCCCGATGCGAGGTTGTTGAA
GTCGATGTCCTATCAGgAaGCGATGGAGCTTTCTTACTTCGGCGCTAAAgTTCTTCaCCCcCGCACCATTACCC
CCATcGCCCAGtTCCAGATcCCTtgCCtGATTAAAAATAcCGgAAAtCCCCAAGCACCAGgTACGCtCATTGGT
GCCAGCCGTGATGAAGACGAATTACCGGTCAAGGGCATTTCCAATcTGAATaACATGGCAATgTTCAGcGTTTC
CGgCCCGGGGAtGAAAGGgATggTTgGCATGGCGGCGCGcgTCTTTGCAGcGaTGTCACGCGCCCGTaTTtCCG
TGGTgCtGATTACGCAATCATCTTCCGAATACAGTATCAGTTTCTGCGTTCCGCaAAGCGACTGTGTGCGAGCT
gAaCGGGCAaTGcAGGAAGAGtTCTACCTGGAaCTGaAAGAAGGCTTACTGGAGCcGTTGGCgGtGACGGAACG
GCTGGCCATTATCTcGGTGgTAGGTGATGGTATGCGcACCTtaCGTGGGAtCTCGgCGAAATtCTtTGCCGCGC
TgGCcCGCGCCAATATCAACATTGTCgCCATTGCtCaGGGaTCTTcTGAaCGCTCAAtCTCTGTcGTGGTcAaT
AACGATgATGCGACCACTGGCGTGCGCGTTACTCATCAGATGCTGTTCAATACCGATCAGGTTATCGAAGTGTT
TGTGATTGgCGTCGGTGGCGTTGgcGGTGCGCTGCTGgAGCAACTGAAGCGTCAgCAAAGCTGGTTGAAGAATA
AaCATATCGaCTTACGTGTCTGCGGTGTTGCTAACTCGAAGgCACtgCTCACCAATGTACATGGCCTTAATCTG
GAAAACTGGCAGgAAGAACTGGCGCAAGCcAAAGAGCCGTTTAATCTCGgGCGcTtAATTCGCCTCGTGAAAGA
ATATCATCTGCtGAaCCCGGTCATTgTTGACTgTACTTCCAgCCAGGCTGTgGCAGaTCAATATgCCGACTtCC
TgCGCGAAGGTTTCCAcGTTGTtACGCCGAaCAAAaAGGCCaACACCTCGTcgATGGaTTACTaCCATCAGTtG
CGTTATGCGGCGGAAAAATCGCGGCGTAaATTCCTCtATGACACcaACGTtGGGGCTGGATTACCGGTTATTgA
GAACCTGCAAAATCTGCTCAATGCtGGTGATGAATTGATGAAGTTCTCCGGCATTCTTTCAGGTTCGCTTTCTT
AtATCTTCGGCAAGTTAGACGAAGGCaTGAGTtTCTCCGAGgCGACCaCACTGGCGCGGGAAATGgGTTATACC
GAACCGGAcCcGCGAGATGATCTTtCtGGTATGgAtGTGGCGCgTAagCTAtTGATtCTCGCTCGTGAAACGGG
ACGTGAACTGGAGCtGGCGGATATTGAAATTGAACCTgTGCTGCCCGCaGaGTTTAACGCCGAGGGTGATGTCG
CcGCTTTTATGGCGAATCTGTCACAGCTCGACGaTCtCTTTGCCGCGCGTGTgGCGAAGGCCCGTGATGAAGGA
AAAGTTTTGCGCTATGTTGGCAATAttGATGAAGATgGCgTCTGCCGCGTGAAGaTTGCCGAAGTGGATGgTAA
TGaTCCGCTGTTCAAAGTGAaAaATGGCGaAAACGCCCTGGCCTTCTATAGCCACTATtATCAGCCGCTGCCGT
TGGTACTGCGCGGATATGGTGCGGGCaATgACGTTaCAGCTGCCGGTgTCTTTGCTGATCTGCTACGtACCCTc
TCAtGGaAGTTAGGAGTCTGA
homoserine kinase
ATgGTTAAAgTTTAtGCCCCGGCtTCCAGTGCCaATATGaGcGTCGgGTTTGATGTGCTCGGGgCGGCGGTGAC
ACCTGTTGATGGTGCATTGCTCGgAGaTGTagTcaCGGTTGAGGCGGCAGAGACaTTCAgTCTCAACAACCTCG
GACGCTTTGCCGAtAAGCTGCCGTCAGAGCCACGgGaaAATAtCGTTtATcAGTGcTGGGAGCGTtTTTGcCaG
GAGCTTGGCAAGCAAATTCCAGTGGCGATGaCTCTGGAAAAGAATatGCCGAtCgGTTCGGGcTTAGGCTcCAG
CGCCtGTTCAGTGGTCGCGGCgCTgAtGGCGATgAATGAAcACTGCGGCaAGCCGCTTAATGACACTCGTTTGC
TGGCTTtGATGGgCGAgTTGGAAGGGcGTATCTCCGGCAGCAtTCATTACGACAACGtGGCACCGTGtTtTCtT
GGTGGTAtGCAGTtgATGATCGAAGAaAACGACATCATCAGCCAGCAaGTGCCAGGGTTTGATGAGtGGCTGTG
GGTGCTGGCGTATcCGGgGAtTAAAGTCtCGaCGGcAGAAGCCAGGGCTaTTTTACCGGCGCAGTATCGCCGCC
AGGATTGCATTGCGCAcGGGCgACATCTgGCAGGCTTCATTCACGCCTGCTATTCCCGTCAGCTTGAGCTTGCC
GCGAAGCTGATgAAAGaTGTTATCGCTGAACCCTACcGTGaACgGTTaCTGCCAGGCTTCCGGCAGGCGCGGcA
GgCGGTTGCGGAAATCGGCGCGGTAgCGAGCGGTATCTCCGGCTCCGGCCCGAcTtTGTTCGCTCTGTGtGAcA
AGCCGGATACCGCCCAGCGCGTTGCCGACTGgTTGGGTAAGAACtAcCTGCAAAATCAGgAAGGTTTTGTTcAT
ATTTGCCGGCTGGATACGGCGGGcGCACGAgTACTGGAAAACTAA
threonine synthase
ATGAAACTCtacaATCTGAAAGATCACAATGAGCAGgTCaGCTTTGCGCAAGCCGTAACCCAGgGgTTAGGCAA
AAATCAGGGgCtGTtTTTTCcgCACgaCCTGCCGGaaTTCAGCcTgACTGAAaTTGATGAGATgCTGAAGCtGG
ATTTTGTCACcCGCAGTGCGAAGATCCTcTCgGCGTTTATTGGTGATGAAATCCCGCAGGAAaTCCTGGAAGAG
CGCGTACGTGCGGCGTTTGCCTTCCCGGCTCCGGTCGCCAATGTTGAAaGCGATGTCGGTtGTCTGGAaTTGTT
CcACGGGCcAACGCTGGCaTTTAAAGATTTCGGcGGTcGCTTTATGGCACAAATGCTgACCcATATTGCGGGCG
ATAAGCCAGTGAcCATTCTGACCGCGACATCCGGTgATACTGGaGCGGCAGTGGcTCATGcTTTCtACGGTtTA
CCGAATGTGAAAGTGGTTATCCTCTATCCACGAGGCAAAATCAGTCCACTGCAAGAAAAACTgTTCTGTACATT
GgGCggCAATATCGaAACTGTTGCCATCGAcggCGaTTTCGATGCCTGTCAGGCGCTGGTgAAGCAGGCgTTTG
ATGATGAAGAACTGAAAGTGgCgCtGGGGCtGAATTCTGCTAAcTCCATCAACaTCAGTCGCTTGCTGGCGcAG
ATTTGTTaTTAcTTTGaGGCTGTCGCACAGTtGCCGCAAGAAGCACGTAACCAGTTGgTTGTCTCGGTaCCGAG
TGgAAACtTcGGCGATtTGACGGcGGGTCTGCTGGCGAaGTcACTCGGTCtGCCGGTAAAACGTtTTATTGCtg
CGACCAACGTGAACGAtACCGTACCACGTTTCCTGCaCGaCGGTCAGTGGTCAcCCAAaGCGACTCAGgCGAcg
TtaTCCAATGCGATGGATGTTAGCCAGCcAAaCAACTGGCCGCGTGTGGAAGAGTTGtTCcGCCGCAAAATCTG
GCAACTGAAAGAGCTGGgTTATGCAGCCGTGgATGATGAAACCACGCAACAGACAATGcGTGAGtTAAaAGAAC
TGGGCTATACCTCGgAGCCGCACgCTGCCGTAGCTTATCGTGCGCTGCGTGACCAgTTGAAtCCAGGCGAATAT
GGCTTGTtCCTCGGcACcGCGCATCcGGcGAAatTtAAAgAGAGCGTGGAAGCGATTCTCGGTGAAAcGTTGGa
tCTGCCAAAAGAGCTGGCAGAACGTGCTgATTTACCCTTGCTTTCGCATAACCTGCCCGCCGATTTTGCTGCGT
TGCGTAAatTgaTGATGAaTCATCAGTAA
hypothetical protein
AtGCAGCCcGGCTtTTTTTATGAAGAAAATaTGGAGaAaAACGACagGGAAAAAGGAGAAATTCtCAATAAATG
CGGtAACTTAGAgATTaGGATTGCGGAGAATaACAACTGCcGTTCTCaTCGCGTAATCTCCGGATATCGACCCa
TAACGGgCAATGATAAAAGgAGTAACCTGTGA
Non-protein region
aAAAACTgCTGGAAACAATGAAAGAcGTACCGGACGACCAAcGTCAGgCGC
transaldolase B
ATGACGGACAAATTGaCCTCcCTTCGTCAGTACACCACCGTAgTGGCCGACACTGGGGACATCGCGGCAATGAA
GcTGTaTCAACcGCAGGATGCCACAACCAAcCCTtCTCTCATTCTTAACGCAGCGCAGATTCcGGAATACCGTA
AgTTgATTGaTGATGCTGTCGCCTGGGcGAaACaGCAGAGCAAcGATcGCgCgCAGCAgATCGtGGACGCGACC
GAcAAACTGGCAGTAaATATTgGTCTgGAAaTCCTGAAACTGgTTCCGgGCCgTATCTCAActGAAGTtGATGC
GCGTCTTTCCTATGACaCCGAAGCGTCAATTGCGAAAGCAAAACGCCTGATCAAACTCTACAACGATGcAGGTa
TTAGCAACGATCgTaTTCTGATCAAACTGGCTTCTACCTGGCAGGGTATCCGTGCTGcAGAACAGCTGGAAAAA
GAaGGTATTAACTGTAAcCTGACCCTGCTgtTCTCctTCGCtCAGGcTCGTGCTTGTGCGGaAGCGGgCGTgTT
CCTGaTCTCGcCGTTTgTTGGCcGTATTCTTGACTGGTAcAAaGCGAATACCGaTAAGAAAGAGtACGCTCcGG
CAGAAGATcCGGGCGTGGTTTCTGTatCtGAAATCtACCAGtACTACaAAGAGCATGGTTaTgAAACCGTGGTT
ATGGGCGCAAGCTTCCGTAACATCGGCGAAATTCTGGAAcTGGCAGGCTGCGACCGTCTGACCatCGCACCGgc
ACTGCTGAAAGAGCTGgCGGAGAGCGAAGGGGCTATCgAACGTAAACTgTCTTACAcTGgTGAAGTgAAAGCgC
GTCCGGcGCGTATCACtGAGtCCGAGTTCCTgTGgCAgCACAACCAGGATCCAATGGCAGTaGATAAACTgGcG
GaAGgTATCCGTAAGTTTGCTGTTGACCAGGAAAAACTGGAAAAAATGATCGGCGATCTGCtGTAA
molybdopterin biosynthesis mog protein
ATGAATACTTTACGTATTGGCTTaGTtTcCaTCTCTGATCGCGCATCCAGCGGCGTTTAtCAGgaTAAAgGCAT
CCCTGCGCTGGAagAATGGCTGACAtcGGCGCTAACCACGcCGTTTGAaCTGGAAAcCCgCTTaATCCCCGATG
AGCAGGCGATCATCGAGCAaACgTTgTGTGAGCTGGTGGATGAAaTGAGtTGCCaTCTGGTGCTCACCACGGGC
GGAAcTGGCCCTGCGCGTCGTGAcgTAACGCcCGATGcGACGCTGGCAGTAGCGGACCGCGAGATgCcAGGCTT
TGGTGAACAGATGCGCCAGATCAGCCTGCATTTTGTACcaaCTGCGATCCTTTCGCGTCAGGTggGGGTgATTC
GCAAACAGGCGCTGATCCTTAACTTaCcCGGTCAACCGAAGtCTATTAAAGAGACGCtGgAAGGTGtGAAGGAC
GCTGAGgGTAAcGTTGTGGTGCACGgTATTTTTGCCaGCGTaCcGTaCTGCATTCAGTTGCTGGAAGGGCCATA
CGTTGAaACGGCaCCgGaAGTGGTTGCAGCATTCAGaCCGAAGAGTGCAaGACGCGAAGtTAGCGAATAA
chaperone protein DnaK
aTGGGTAAAATAaTTGGTATCGACcTGGGTACtACCAaCTCTTGTGTagCGaTTAtGGATGGCACCACTCCtCG
TGtACTGgAGAACGcCGAAGGCGATCGCACCAcGcCTTcTATCATTgCCTATACCCAGGAtGGTGAAACTCTGG
TTgGTCAGCCGGCTAAACGTCAGGCAgtGACGAACCCgCAaAACAcCCTGTtTGCGATTAAACGCCtGATTGGC
CGCCgCTTCCAGgACgAAGAAGTACAGCGtGATgTTTcCATCATGCCGTTCAAAATTAtTGcTGCtgatAACGG
CGACGcATGGGTCGAAGtTAAAgGCCAGAAAATGGCAcCGCCGcAGAtCTCTGCTGAAGTGCTGAAAAAAAtGA
AGAAAACCGCTGAAGaTTAcCTGGgTGAAcCGGTAACTGaAGCTgtTATTACCGTACCGGCAtACTttaACGAT
GCTCAGCGTCAGGcAACCAAAGaCGCAGGCCGTATCGCTGGTCTGGAAGTAAAaCGTATCATCAACGAaCCGAC
CGCAGCTGCGCTGGCTtACGGtCTGGACAAAGgTACTGGCAACCgtACTATCGCGGTTTATGACCTGGGTGGTG
GTACTTTCGATATTTcCATTATCGAaATCGACGAAGTTGACGGCgAAAAAACCttCGAAGTTCTGGCAACCAAC
GGTGATACCCACCTGgGTGGtgAAGACTTCGACAGTCGTCTGATCAACTAtCTGGTTGAaGAATTCAAgAAAGA
TCAGGGCATTGacCtGCGCAACGaTcCGCTGGCAATGCAGCGCCTGAAaGAAGCGGCAGAAAAAGCgAAAATCG
AACTGTctTCCGCTCAGcAGACCGaCGTTAACcTGCCGTACATCACTGCAGACGCGAcCGGTCCGAAACACAtG
AACATCaAAgTGactCGTGCGAAACTGGAAAGCCTgGtTGAAGAtCTGGTAAACCGtTcCATTGAGCCGCTGAA
AGTTGCACTGCAGGACGCTGGCCTGTCCGTATCTGATAtCGACgaCGTTATTCTCGTTGGTGGTCAGACTCGTA
TGCcAATGGtTCAGAAGAAAGTTGCTGaATTCTTTGGTAAAgAGCcGCGTAAAGATGTTAACCCGGACGAAGCT
GTaGCCATCGgTGCTGCTGTTCAGGGTGGTGTTCTGACTGGtGAcGTAAAAGaCGTacTGCTgCtGGACGTTAC
CCCGCTGTCtCTGGGTATcGaAACCaTGGGCGGTGTGATGACCACGCTGATCGCgAaAAACACCACTATCCCGA
CCAaGcAcaGCCAGGTGTTCTCTACCGCTGAAGACAACCAGTCTGCGGTAACCATcCATgtGCTGcAGGGTGAA
CgTAaACGTGCgGCTGAtAAcaAATCTCTgggTCAGTTcAACCTGGATGGTATCAaCCCGGCACCGcGCGGCAt
gCCGcAGATCGAAGtTACCtTCGAtATCGaTGCTGACGGTATCCTGCaCGTTTCCGCGAAAGACAAAAACAGCG
GTAAAGAGCAGAAGATCAcTATCAaGGCTTCTTCTGGtCTGAaCGAAGAtGAAATCCAGAAAATGGTACGCGaC
GCAGAAGCTAAcGCCGAAGCTGACCGTAaGTTTGAAGAGCTGGTACAGACtcGCaACCAGGGCGACCATCTGCT
GCACAgCACCCGTAAGCAgGTTGAAGAAGCAGGCGACAaACTGCCGGCTGACGACAAAACTGCTATCGAGTCTG
CGCTGActGCACTgGAAACtGCTCTGAAaGGTGAAGaCAAAGcCgCTATcGAAGCGAAAATGCAGGAACTGGCA
CAGGTTTCCCAGAAACTGATGGAAATCGCCCaGCAGCAACATGCcCAGCAGCAGACTGCCGGTGCTgATgCTTC
tGCAAaCAAcGCGAAAGaTGACGATGTTGTCGACGCtGAATTTGAAGAAGTCAAAGACAAAAAATAA
chaperone protein DnaJ
GTGCatTCatCTAGGGGcAATTTAAAAAAGATGGCTAAGCAAGATTaTTACGAGaTTTTAGGCGTTTCCAAAaC
AGCGGAAGAGCGtGAaaTCAAAAaGGCCTACAAACGCCTGGCCATGAAaTACCaCCCGGaCcGTAACCAGGgTG
ACAAAGaGGCCGAGGCGAAATTTAAAGAGATCAAGGaAGCTTATGAAGTTCTGACCGACtCGCAAAAACgTGCg
GCATaCGATCAGTaTGGTCATGCTGCGTTTGAGCAAGGTGGCATGGGCGGCGGcGGtTTTGGCGGCGGCgCAGA
CTTcAGCGATAtTTtTGGTGACGtTTTCGGCgATATTTTTGGcGGCGGACGTGGTCGTCAACGTGCGGCGCGCG
GTGCTGATTTAcGCTATAACATGGAGctCACcCtCGAAgAAGCTGTACGtgGCGtGaCCAAAGaGATccGCATt
CCGACTCtGGAAGAGTGTGACGTTTGCCACgGTAGCgGTGCAAAACCaGGTACACAgCCgCAGACCTGTCCGAC
cTgTcATGGTTCTGGCCAGGtGCAGATGcGCCAGGGTTTCTTTGcCGTGCAGCAGACCTgTCcAcACTGTCAGG
GCCGCGGTACGCTGaTcAAAGATCCGTGCAACAAATGTCATGGTCATGGTCGTGtTGAGCgCaGCAAAACGCTG
TCCGTTAAAATCCCGGCaGGGgTGGACACTGGAGaCCGCATCCGTCTTGCGgGCGAAGGTGAAGCGGGTGAACA
CGgCGCACCGGCAGGCGATCTgTACGTTCAGGTtCAGGTtAAACaGCACCCGATTTTCGAGCGTGAAGGCAACA
ACCTGTATTGcGAAGTcCCGATCAAcTTCGCTATGgCGGCGcTGGGTGGTgaAATCGAAGTACcGACCcTTGAT
GGTcGcGTCaaACTGAAAGTGCCTGGCGAAACCCAGACCGGTAAgCTGtTCCgTaTGCGCGGTAAAGGCGTCAA
GTCtGTcCGCGGTGGcgCACAGGGTGATTtGCTATGCCGCGTTGTTGTCgaAACAcCGGTAGGTTTGAACgAGA
AGCAGAAACAGCTGCTGCAAGaGctGCAAGAAAGCtTTGGTGGcCCAACCGGCGAGCACAACAGCCCGCGTTCA
AAGAGCtTCTTtGATGGCGTGAaGAAGTTTTTTGACGaCCTgACTCGCTAA
hypothetical protein
TTGCTCTTaCTCGGATTCgTAAGCCGTGAAAACAGCAaCCTCCGtCTGGCCAGTTCGGATGTGAACCTCACAGA
GgTCTTTTCTCGTTACCAgCGCCGCCACTACGGCGGTgATACAGATGACGATCAGgGcgACaAtcAtCgCcTTA
TGCTGCTTCATTGCTCtCTtCTCCTTGACCTTTCGGTCaGTAAGAgGCACTCTACATGTGTTCTGCATATAGgG
GGCCTCGgGTtGATGgTAAAATAtCACTCGGGGCTTTTCTCTAtCTGCCGTTCAGCTAATgCcTGA
hypothetical protein
aTGTCTGCCAAaaGACGACTTCTTATTGCGtGTACCTTGAtAaCAGCTATcTATCAtTTTCCTGcaTATTCTTC
ATTAgAATATAAAGGAtCCTTTGGTTCAATaAATGCGGGTTAtGCAGACTGGAATAGTGGaTTTgTAAaCACTC
ACCGTGGTGAaGTATGGAAAGTGACtGCGGATTTTGGGgTaAATTTTAAAGAAGCAGAATTTTACTCAtTTTAT
gAaAGTAATGTACTCAATCATGCTGTAGCAGGGAGAAATCATACgGtTTCAGCAATGaCGCATGTCAGACTCtT
TGaCtCTGATaTGACATTCTTTGGCAAAATTTaTGgCCAATGGGATAACTCATgGggTGAcGATCTgGACATGT
TTTATGGATTCGGTTACCTCGGCTGGAACGGCgAgTGGgGCTTTTtTAAACCGTATATTGGATtGCATAATCAA
TCTGGTGACTACGTATCAGCTAAATaTgGTCAAACGAATgGTTgGAATGGtTATGTTGTTGGCTGGACAGCAgT
ATTAcCATTTAcGTTATTTGACGAAAAATTTGTTTTATCTAACTGGAATGaAATAGAACTGGACAGGaACGATG
CTTACACGgAgCAGcAATTTGGCcGGAACGGgTTaAaTGGCGGtTTAACTATTGcCTGGAAGTTCTATCCTCGC
TGGAAAGCCAGtGTGACGTGGCGTTATTTcGATAAtAaGCTGGGCTACGATGGCTTTgGcgaTCAAATGATTTA
tATGCTTGgTTATGATTTCtAA
putative secreted sulfatase
ATGCAGAAAACGTTAATGGCCAGTTTGATCGGCCTTGCAGTTTGCACAGGGAAtGCTTTTAGtCCTGCCTTAGC
CGCAGAGGCTaAACAACcTAATTTAGTCATtaTTATGGCGGaTGATtTAGGTtaTGGCGAtTTAGcAaCaTATG
GTCATCAGATCGTTAAAACACctAATATCGACAGGCtTGCCCAgGAAGGGGTCaAATTtACTGAcTaCTATGCC
CCCGCTCCTTtAaGTTCAccTtCACGCGCaGGGCTATTAACCGGCcGGATGCCATTtCGTAcTGGAATTCGCTC
ATGGATtCCttCAGGCAAAGATGTTGCCtTAGGGCGTAACGAAcTCACgATTGCTAaTCTACTCAaAgCGCAaG
GGTACGACACggCAATGATGGGTAAGCTGCATCTGAATgCAGGcGGCGaTCGCACCGATCAgCCaCAAGCACaA
gATATGGGcTTTGATTAcTCAcTGGTtAATACgGCGGGCTTTGTTACcGACGCCACGCTGGATAAcGCTAAAGA
ACGCCcGCGTTATGGCATGGTTtAccCGACAGGCtgGCtACGTAACGGGCAACCCACTcCACGaGCTGATAAAA
tGAGCGGTGAGTATGTCaGTTCGGAAGTCGTCAACTGGCTGGATAACAAAaaGGACaGCAAGCCTTTCTTCCTC
TATgTTGCTTTTACCGAAGTGCATAGCCCCCTGGCTTCGCCCAAAaaATACCTCGATaTGTaCTCACaATATAT
GAGCGCGTATCAGAAGCAGcATCCTGATTTAtTTTaTGGCGACTGGGcAgACAAACCCTgGCGTgGTGTGGGgG
AATATTAtGCCAATATCAGCTATCtGGATGCAcAGGTTGGAAAAgTGCTGGaTAAAATCAAAGCTGTGGgtGaA
GaaGaTAACACAATCGTTATTTTTACCAGTGatAACGGTCCgGTAaCGCGTGAAGCGCGCAAAGTGTATgAGCT
GAATTTGGCAGGGGAaACGGaTGGATTACGCGGTCGCAAGGATAACCTTTGGGAAGGCGGAATTCGtGTTCCaG
CCATTATTAAATaTGGTAAACATCTACCACAGGGAATGGTTTCAGATACACCCGTTTATGGtCTgGACTGGATG
CCTACtTTaGCgAaAATGATGAACTTCAAATTACCTACAGAcCGTAcTTTCGATGgTGAATCGCTGGTTCCTGt
TcTTGAGCaAAAAGCATTGAAACGCGAAAAGCCATTAATTTTCGGGATTGATATGCCATTCCAGGATgATCCAA
cCGATGAATGGGCGATCCGTGATGgTGACTGGAAgAtGATTATCGATCGcaATAATAAACcGAAATATCTCTAC
AATCTGAAATCTGATCGTTATGAAaCaCTTaAtCTGATCGGTAAAAAAACAgATATTGAAAAACAGATGTATGG
TaAGtTTtTAAAATATAAAACTGATATTGATaATGATtCTCTAATGAAAgCCAGAGGTGATAAACCAGAAGCGG
TGACCTggGGCTAa
putative cytoplasmic protein
ATGTTTACcAacGTAAATGTTGATTGtTgCAAAACACCAGGAtGTAAaaACCTGGGGTTGCTGAATAGCCAGGA
TTATGTCGCAcAGgGTaAaAATATTTtATGCCGTGAATGTgGTTaCTTGTtTCCAGtGATATCTGAACAGTCGC
TTAAtATTTaTCGTAATATTGTGAAtcACTcCTGGAGAGGTTTGATTTGCCAATGTTCAACTtGCGGAGGcACG
TCCCTCAAAAAATaTGgATATtCtGCAcAagGCCAgAGAAGAATgTATTGCcaTCAtTGTGaGAAAACaTTtAT
CACTCTGGAAcAtGTAATTACcACACCACGAGGAGCcCTGTTAGcATTGATGATTGAGCAAGGGGAGGCACTTG
CGGaTATCAgAAAGTCATTACGTCTTAACAgCACTGGACTTAGCCGTGAACTGTTAAAATTAGCGCGTGAAGcA
AACTATAAAGAAAGTCGACAGTGTTTCCCTGCTTCTGATATTACCCTGAGtACCCGCGCTTtTCGcgTCAAGTA
tAATGGTAGCAATAACTCTCTTTATGCTCTTGTTACCGCAGAAGAACAAAGcGGCAGGGTgGTTGcCaTCTCAA
CCAATTACTCCCCATCtGCCGTAGagCaaCATTATcAATACaCATCGAACtATGAAGAGcGTATGTCTCCAGGG
ACGCTGGCACAtCATGTCCAGCGCAAAGAGttACTTACTATGCGGCgGGATACCTTGTTTGATATTGATTACGG
CcCGgCAGTTTTACATCAAAACGATCCGGGAATGtTGGTAaAaCCGGTTCTTCCGGCATaTCGTCATTTTgAAC
TGGTCAGAATACTGACCGATGAGCATtCCAACAACGTTCAGCATTACCTTGATCACGAATGCTTTATaTTGGGC
GGCTGcCTGATGGCTAATTTGCAGCaTATTCATCAaGGTCGCTGCCATATTTCcTTTGTCAAaGAGCGcGGTGT
GGCACCCGCCACCATTGaTTTTCCACCGCGATtATTCcTTAGTgGtGGgGTACgAAATAATGTCTGGCGTGCaT
TTTCTAACCGCAATTATTCAaTGGCTGTATGCAAtCTCaCTGGCAGTAAGAAAGTCCGCGAGATGCGGCATGCA
ACATtGAACAGTGCGACGCgTTtTATCCACTTTGTGgaGAACCATCCTTTCCTTATaTCATTGAACCGAATgtC
TCCTGCGaaTGTCgtTTCTACaTTAGATaTCCTCAAACaTCTGTGGAATAaAaAACTAGagCATGGAACAATTt
AA
sodium/proton antiporter 1
GTGAAACATCTGcATCGATTCTTTAGCaGTGATGCCTCGGGAGgCATTATTCTCATTATTGCCGCTGTATTAGC
GATGATTATGGCCAACAGCGGTgcAaCCAGTGGATGGTATCACGACTTTCTTGAGACGCcGGTTCAGcTcCGGG
TTGGGACACTTGAGATCAACAAGAACATGCTGCTATGGATCAATGaCGCTCTGaTgGCGGTATTTTTCCTGTtG
GTTGGTcTGGaAGTTAAAcGCGAGcTGaTGCAaGGTTCGCTGGCCAGTCtGCgCCAGGCGGCatTTCCTGTTAT
TGCCGcAATCGGCGGGATGATTGTCCCGGCATTGCTCTATCTGGCTtTtAACTATGCCGATCCGaTTaCCCGCG
AAGGcTGGGCAatCCCGGCGGCGACTGacATTGCCTTTGCACTTggTgTGTTGGCGCTgTTGGGAAGTCGTGTT
CCGTTAGCGCtGAAGATCTTTTtGATGGCTCTGGCtATTATCGACGATCTTgGGGcCATCATtATCATCGCATT
GTTCTACAcTAATGACTTATCGATGGCCTcTCTTGGCGTcGCgGCTGTAGCAATTGCGgtACTCGCGGTATTGA
AtCTGTgTGGTGTAcGCCGCACGGGCGTtTATATTCTGGTTGGCGTGGTGCtGTGGaCAGCGGTGTTGAAATCG
GGGGTTCACGCAACCcTGGCTGGCGtCATtGtCGGCTTCTTTATTCCTTTGAAAGAGAAGCATGGgCGCTCTCc
GgCTAAACGTCTGGAGCATGTTTTGCAtCCATGGGTGGCGTATCTGATtTTGCCGCTGTTTGCATTTGCTAATG
CTGGCGTTTCACTGCAaGGTgTCACGCtggAaGGTTTgACCtCCATTCTGCCATTAGgGATCATCGCTGGTTTG
CTGaTTGGCaAGCCACtGGGTAtTaGTCTgttcTGCTGGtTGGcgCTGCGTTTGAAATTGGCACATCTGCCAGA
GGGAACgACTtACCAGCAAATTATGGCGGtTGGTaTCcTGTGCGgTATCgGTTtTAcTatGTCTATCTTTATTG
CCAGCCTGGcATTTGGTAgCGTAGATcCAGAaCTGaTTAACtGGGCAAAAtTAgGTATCCTTGTCGGTTCAATT
TCtTcGgCGGTAATTGGATATAGcTGGTTACGcGTTCGTTTACGTCcATcAGTTTGA
transcriptional activator protein NhaR
ATGAGCATGTCTCATaTCAATTACAACCACTtGTATTACTTCTGGCaTGTCTAcAAAgAaGGTTCTGtGGTTGG
CgCAGCGGAGGCGCTTTATTTAACAcCAcAAACCATTACCGGGCaGATCCGGGCGCTGGAaGAGCGCCTGCAAG
GGAAAcTATTTAAGCGTAAAGGAcgTGGTCTGGAACCCAgcGAACTGGGGGAACTGGTCTATCGCtATGCCGAT
AAAATGTTCAcCTTAAgCCAGGAAATGCTgGATATCGTCAACTATCGCAAAGAGTCCAACTtATTGtTTGATGT
TgGTGTGGCAGATGCACTTtcCAAAcGtcTGGTCAGCAGTGTTCtgGATgCCGCAGTtgTGGAAGACGAGCAGA
tCCATCTACGCTGTTTCGAaTCGACGCACGAGATgCTTTTaGAGCAgtTGAGTCAGCATAaACTGGATATGATc
aTCTCTGACTGTCCGaTCGATTCCACTCAGCAGGAAGGGCTGTTTTCCATGAAAaTtGGCGAATGTGGTGTCAg
tTTCTGGTgCACTAACCCACTACcAGAAAAGCCGTTTCCTGCCtGTCTTGAAgAGCgTCGtTtACTTATTCCGG
GGCGTCGCTCAaTgTTGGGGCGtAAACTATTAAACTGGTTTAACTCcCAGGGCTTGAACGTCGAAATTTTGgGT
GAGTTTGATGATGCTGCGTTGATGAAAgCCTTTGGGGCGAcGCATAACGcTATTTTCGTTGCACCTTCGCtTTA
CGCTAATgATTTCTATAACgATGACTCGgTtGTGgAGATAGgCCGTGTTGAGaACGTGATGGAAGAGTACCACG
CGATTTtTGCCGaAAGgaTGAtTCAgCACCCTGcAGTAcAGCGTATCTGcAATACAgacTATTCTGCGCtgTTT
ACTCCAGCTTcAAAATAA
riboflavin kinase
ATGAAGCTGATACGCGgCAtACATAATCTCAGCCAGGCCCCGCAAGAAGGGTGTGTGCTGACTATTGGTaATTT
CGACGGCGTGCATCGCggTCATCGCGCGCTGTTACAGGGCtTGCAGGAAGAAGGGCGCAAGCGCAACtTACCGG
TGATGGTGATGCTTTTtGaACCTCAACCAcTGGAACTGTTTGCTACTGAtAAAGCcCCGGCACGGcTcACcCGG
CTGCgGGAAAAACTGCgTtaTcTTgCAGAgTGTGGCGTTGATTACGTGCTGTGCGtGCGTtTTGaCaGGCGTtT
TGCGGCGTTAACCGCGcAAAACTTCATCAgTGATCTtCTGGTGAAGCACTTGCGGGTAAAATTTCTTGCCGTAG
GTGACGAtTTCCGCTTTggCGCTGgTCGTGAAgGCGAtTTCTtGTTATTACAGAaAGcgGGCATGGAATACGGC
TTCGATATcACCAGCaCGCAAAcTTtTTGCGAAGGTGGTGTGCGtATCAGcAGCACCGCCGtgCGTCAGGCGCt
TGCGgATgACAATCTGGCTCTGGCAGAAAGTTTACTGGgGCACCCGTTTGCTATCTCCGGGCGTGTAGTCCACG
GTGATGaATTAGGGCGCAcTATAGGTTTCCCgACGGCGaATGTACCGcTaCgCCGTCAGGTTTCCCCGGTGAAA
gGGGTTTATGCGGTAGaAgTgTTGGgCCtTGgCGAAaAGcCGTTAcCCGGcgTTGCAAACaTCGGAACACgCCC
AACGGTTGCcGGTATTCGCCAGCAACTGgaAGTGCATTTGTTAGATGTTGcAATGGaCCTTTATGGTCGCCAtA
TACAAGTAGTGCTGCGtAAAAaAATAcGCAATGAGCAgCGATTTGcATCGCTGGACGAACTGAAAGCGCAGATT
GCGCGTGATGAATTAACCGcCCGCGaaTTTtTTGGGCTAAcAAAACCGGCTTAa
Isoleucyl-tRNA synthetase
ATGAGTGACTATAAATCaACCCTgAATTTGCCgGAAACAGgGTtCCCGATgCGTGGCGATCTCGcCAAGCGCGA
AcCGGGaATGCTGGCGCGTTGGACTGATGATGATCTgTaCGGCATCATCCGTGCGGCTaAAAAAGGCAaAaAAA
CCTTCAtTCTGCATgATGGCCcTCCTTATGCGAATGGCAGCAtTCaTATTGGTcACTCGGTTAACAAGATTCTG
AAAGACATTaTCATTaAgTCCAAAgGGCTttCTGGATATGACTCGCCGTATGTGCCTGGCTGGGACTGTCaTGG
tCTGCCAATCGAAcTGAAAGTAGAGCAAGAATACGGTAAGCCGGGgGAGaAaTTCACCGCCGcTGAGTtCCGCG
CCAAGTGCCGCGAATACGCTGCgACCCAGGTTGACGGTCAGCGCAAAGACTTTaTCcGTCTGGGCGTGCTGGGC
GActgGTCgcACCCGTACCTGACCATGGACtTCAAAACTGAAGCCAACATCATCCgCGCGCTGGGCAAAATCAT
CGGCAAcGGTCACCTGCACaAAGGcGCGAAGCCGGTgCACTGGTGCgTTGACTGCCGTTCTgCACTGGCAGAAG
CGGAAGtTgAGTATTACGacAAAACTtCTCCGTCCATCGACGTCGCTTtCCAGGCGGTCGATCaGGATGCGCTG
AAAACGAAATTTGGCGTAAGCAATgTTAACGGCCCAATTTCGCtGGTTATCTGGaCcACCACGcCGTGgAcGCT
GCcTGCTAacCGCgCAATCTCcATtGCACCTGATTTTGAttATGCGCTGGTGCaAatCgACGGTCAGgCCGTGA
TCCTCGCGAAAGATCtGGtTGaAAGCGTAAtGCAGCGTATCGGCGTTAGCGaTTACACCATTCTTGGCAcGGtg
AAAGGTGCCGAGCtGGAACTGTTgCGCTTTACCCATCCGTTtATGGACtTCGATGTTCCGGCAaTTCTCGGCGA
CcACGTTACgCTGGATGCCGGTACCGGTGcCGTTCATACCGCGCCAGGCcACGGTCCGGaCGACTATgTGATCG
GTcAAAAATaTGgTCTGGAAaCCGCTAACCCgGTTgGCcCGGACGgCACtTaTCTGCcGgGTACTTACCCGACT
CtGGATgGCGTTaACGTCTTCAAAGCGAACGaTATTGTCATTGCGTTGTTgCAGGAAAAAGGcgCACTGTTGCA
CGTTGAGAAAATGCAACACAGCTATCCGTgCTGCtGGCGTCaTAAaACGCCGATCAtCTTCCGcgCGACGCCGC
AGTGGTTCGTCAgCAtgGATCAGAAAGGTCTGCgTGCGcAGTCACTGAAAGAGATCAAAGGCgTGCAGTGGATC
CCTGACTGGGGCCAGGCGCGTATCGAGTCGATGGTTGCTAACCGTCCTGACTGGTGTATcTCTCGTCaGCGTAC
CTGGGGcGTGCCgATGTCACTGTTCGTgCaCAaaGACACAGAAGAaCTGcATCCGCgTACTCtcGAACTGaTGG
AAGAAGTGGcAAAACGCGTTgAAGTtGACgGCATTCAGGCgTGGTGGGATCTCGATGCGAAaGAgATCcTCGGC
GaCGAAGCTGACCAGTATGTGAAAGTACCGGATACGCtGgATGTATGGTtTGACTCCGGATCTACCCACTCTTC
CGTTGTTGATGTGCGTcCGGAATtTGCCGGTCACGCAGCGGACATGTaTcTGgAaGGTTCTGACCAACACcGTG
gCTGGTtCATGTCtTCCCTGATGATCTCTACCGCGATGAAGGGcAAAGcGCCATATCGTCAGGTACTGACTCAC
GGCTTTAcCGTGGATGGTCAGGGTCGCAAGATGTCTAAATCCATCGGtAACaCcGTTTCGCCGCAGGATGTgAT
GAATAAACtGGGtGCGGATATTCTGCGTCTGTGGGTGGcATCAACCGACTAcACTGGCGAAATGGCcGtTTCTG
ACGAGATCcTGAAACGtGCTGCcGACAGCTATCGTCGTATCcGTAACAcCgCGCGCTTCCTGCTGGCAAACCTG
AACgGTTtTGAtCCGGCaAAAGaTATGGTGAAACCGGAAGAGATGGTGGTaCTGGATCGCTGGGCCGtAGGTTG
TGCGAAAGCGGCACAGGAAGACATCCtCAAGGCgTACGAAGCATACGATTTCcACGAAGTGGTaCAGCGTcTGa
TGCGCtTCTGCTCCGTTGAGATGgGTTccTTCTACCTCGACATCATCAAAGACCGTCAgTATACcGCCAAAGCG
GaCAGCGTGGCGCGTCGTAGCTGCCAGAcTgCGCTGTATCACATCGCaGAAGCGCTGGTTCGCTGGATGGCAcC
AATCCTCTCCTTCaCcGCTGaTGAAGTGTGGGGtTaCCTGCCggGCGAACGTGAAAAATACGTCTTCAcCGGCg
AgTGgTACGAAGGCCTGtTTGGTCTGGCAGACAGTGAAGCAATGAACGaTGCGTTCTGGGACGAGCTGTTGAAA
GTGcGTGGCGAAGTGAAcAAAGTcaTTGAGCAAGCgCGTGCCGATAAGAACGTGGGcGGCTCGCTGGAAGCGGC
AGTAAcCTTGTATGCAGAACCGGAaCTGGCgGCGAaaCTGaCCGcGCTGGGCGAtGAATTACGATTTGTCCTGt
TGACCTCCGgCGCTAcCGTTGcAGACtATAACGACGCACCTGCTGATGCCCAGCAGaGCGAaGTcCTCAAAGGG
CTGAAAgtCGCGTTGAGTAAAGCCgAAGGtGaGAAGTGTCCtcGctGCTGgCACTACACCcAGgATGTcGgCAA
GGTGGCGGaACACGCAGAAATCTGCGGCCGCTGTgTcAgCaACGTCGCCGGTGACGGTGAAAAaCGTAAGTTTG
CCTGA
Non-protein region
GCTTGCGCCAACGcCATTTCATCGCCATCCCGCCgAgcATACAGGCCTCGgAaGAACCAaTGGTGTTGGTGcCA
ACGGCCtGAccATTTTTcGGTGCAGGCGCATGCCACAGATCGGCAACCATGTTTACGCAACGCAGATCGATTGC
TGcAGaTTGCGGATATTctTCTTTGTCGATCCAGTTTTTGTtAATGGAtAAAtCCA
FKBP-type 16 kDa peptidyl-prolyl cis-trans isomerase
ATGTCTGAATCTGTACAGaGCAaTAgCGCCGTCCTGGTGCACTTCACGCTAAAACTCGACGAtGGCaCCAcCGC
TGAGTCTACCCGCAaCAaCGGTAaACCGGCGCTGTTCCGCcTGgGTgATGCTTCTCTTTCTgAaGgGCTGGAGC
AACACCTGCTgGGGCTGAAAGTGGgCGATAAAACCaCCTTCtCGCTGGAGCCAGATGCGGCgTTtgGCGTGCCG
TcACCgGAcCTGATtCAGTAcTTCTCcCGCCGTGAATTTATGgATGCAGGCGAGCcaGAAATTGGCGCAATCAT
gCTTTTTACCGCAATGGaTGGCAGTGAGATGCCTGGCGTGaTCCGCgAAATTAACGGCGACTCCATTACCGTTG
ATTTCAACCaTCCGCTgGCCGGGCAGACCGTTCATTTTGATATTGaagTGCTGGaAATCGATCCGGCAcTGGAG
GcGTaA
Terribly sorry for the long question but thank you if you can help.
In this assignment you read an input file containing named sequences of nucleotides and produce information about them. For each nucleotide sequence, your program counts the occurences of each of the four mucleotides (A, C, G, and T). The program also computes the mass percentage occupied by each nucleotide type, rounded to one digit past the decimal point. Next the program reports the codons (trios of nucleotides) present in each sequence and predicts whether or not the sequence is a protein-coding gene. For us, a protein-coding gene is a string that matches all of the following constraints* * begins with a valid start coapn (ATG) . ends with a valid stop codon one of the following: TAA, TAG, or TGA) contains at least 5 total codons (including its initial start codon and final stop codon) Cytosine (C) and Guanine (G) combined account for at least 30% of its total mass These are approximations for our assignment, not exact constraints used in computational biology to identifv proteins) The DNA input data consists of line pairs. The first line has the name of the nucleotide sequence, and the second is the nucleotide sequence itself. Each character in a sequence of nucleotides will be A, C, G, T, or a dash character, "-". The nucleotides in the input can be either upper or lowercase. Input file dna.txt (partial): ure or canear protern AIGccACTATGGTAG captain picard hair growth protein ATgCCAACATGgATGCC-GATA GGATTgA bogus protein The dash --" characters represent "junk" or "garbage regions in the sequence. For most of the program they should be ignored in yor computations, though they do contribute to the total mass of the sequence as described later Program Behavior. Your program begins with an introduction and prompts for input and output file names. You may assume the user will type the name of an existing input file that is in the proper format. Your program reads the input file to process its nucleotide sequences and outputs the results into the given output file. Notice the nucleotide sequence is output in uppercase, and that the nucleotide counts and mass percentages are shown in A, C, G, T order. A given codon such as GAT might occur more than once in the same sequence Log of execution (user input underlined): program reports secruences that mav encodie prote2n5. Input file name ? dna. txt Output file name? output.txt Output file output.txt after above execution (partial) gion Name: cure Eor cancer protein Nucleotides: ATGCCACTATGGTAG Nuc. Counts: 4, 3.4. 41 Total Mass: [27.3, 16.8, 30.6, 25.31 of 1978.8 Codons List: CATG, CCA, CTA TGG, TAG] Is Protein? YES Region Name: captain picard hair growth protein Nucleotides: ATGCCAACATGCATGCCCGATATGGAITGA Nuc. Counts: [9. 6, 871 Total Mass 30.7, 16.8, 30.5, 22.1] of 3967.5 Codon L5t : [ATG, CCA, ACA, TGG, ATG, CCC, GAT, ATG, GAT, TGA] Is Protein? YES Region Name: bogus protein Nucleotidies: CCATT-2ATGATCA-CAGTT Nuc. Counts: [6, 4, 2. 61 Total Mass 32.3, 17.7. 12.1, 29.91 of 2508.1 Codons List: [CCA, TTA, ATG, ATC, aca, GIT Is Protein?: NO In this assignment you read an input file containing named sequences of nucleotides and produce information about them. For each nucleotide sequence, your program counts the occurences of each of the four mucleotides (A, C, G, and T). The program also computes the mass percentage occupied by each nucleotide type, rounded to one digit past the decimal point. Next the program reports the codons (trios of nucleotides) present in each sequence and predicts whether or not the sequence is a protein-coding gene. For us, a protein-coding gene is a string that matches all of the following constraints* * begins with a valid start coapn (ATG) . ends with a valid stop codon one of the following: TAA, TAG, or TGA) contains at least 5 total codons (including its initial start codon and final stop codon) Cytosine (C) and Guanine (G) combined account for at least 30% of its total mass These are approximations for our assignment, not exact constraints used in computational biology to identifv proteins) The DNA input data consists of line pairs. The first line has the name of the nucleotide sequence, and the second is the nucleotide sequence itself. Each character in a sequence of nucleotides will be A, C, G, T, or a dash character, "-". The nucleotides in the input can be either upper or lowercase. Input file dna.txt (partial): ure or canear protern AIGccACTATGGTAG captain picard hair growth protein ATgCCAACATGgATGCC-GATA GGATTgA bogus protein The dash --" characters represent "junk" or "garbage regions in the sequence. For most of the program they should be ignored in yor computations, though they do contribute to the total mass of the sequence as described later Program Behavior. Your program begins with an introduction and prompts for input and output file names. You may assume the user will type the name of an existing input file that is in the proper format. Your program reads the input file to process its nucleotide sequences and outputs the results into the given output file. Notice the nucleotide sequence is output in uppercase, and that the nucleotide counts and mass percentages are shown in A, C, G, T order. A given codon such as GAT might occur more than once in the same sequence Log of execution (user input underlined): program reports secruences that mav encodie prote2n5. Input file name ? dna. txt Output file name? output.txt Output file output.txt after above execution (partial) gion Name: cure Eor cancer protein Nucleotides: ATGCCACTATGGTAG Nuc. Counts: 4, 3.4. 41 Total Mass: [27.3, 16.8, 30.6, 25.31 of 1978.8 Codons List: CATG, CCA, CTA TGG, TAG] Is Protein? YES Region Name: captain picard hair growth protein Nucleotides: ATGCCAACATGCATGCCCGATATGGAITGA Nuc. Counts: [9. 6, 871 Total Mass 30.7, 16.8, 30.5, 22.1] of 3967.5 Codon L5t : [ATG, CCA, ACA, TGG, ATG, CCC, GAT, ATG, GAT, TGA] Is Protein? YES Region Name: bogus protein Nucleotidies: CCATT-2ATGATCA-CAGTT Nuc. Counts: [6, 4, 2. 61 Total Mass 32.3, 17.7. 12.1, 29.91 of 2508.1 Codons List: [CCA, TTA, ATG, ATC, aca, GIT Is Protein?: NO
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
