Question: Having trouble with DNA java code please help So heres the assignment which I'm having trouble with: So overall its: Make a code that takes

Having trouble with DNA java code please help

So heres the assignment which I'm having trouble with:

Having trouble with DNA java code please help So heres the assignment

which I'm having trouble with: So overall its: Make a code that

So overall its:

Make a code that takes a inputted DNA file and takes the all the info from that text file and stores the dna sequence in arrays which in the end gets printed out using printstream in one method

The dna.txt(for input file):

cure for cancer protein

ATGCCACTATGGTAG

captain picard hair growth protein

ATgCCAACATGgATGCCcGATAtGGATTgA

bogus protein

CCATtAATgATCaCAGTt

michael jordan mad hops protein

ATgAGATCCgtgatGTGggaTCCTaCTCATTaa

paris hilton phony protein

AtgCCaacaTGGATGCCCTAAGATAtgGATTagtgA

george w bush approval rating protein

atgataattagttttaatatcagactgtaa

jimi hendrix guitar talent protein

ATGCAATTGCTCGATTAG

tyler durden's brain protein

ATGATAcctatgagtaaTGTGGACCatatccaaACTATAGGCATtgtcggACCAACGATcgattggtTATACTG

A

mini me growth hormone

AtGgGaCGCTgA

the ecoli.txt(for input file):

thr operon leader peptide

ATGAAACGCATTAGCaCCAcCATtACCACCaCCATCaCcATTACCACAGGTAACGGTGCGGGCTGA

aspartokinase I/homoserine dehydrogenase I

ATGCGAGtGTTGAAGTTcgGCGGTaCATCAgTGGCAAATGCAGAACGTtTTCTGCGGgTTGCCGATAttCTGGA

AAGcAATGCCAGGCAGGGGCAGgTGGcCACCGTCCTCtCTGcCCCCGCCAAAATCACCAACCATCtGGTaGCGA

TGATtGaaAAaACCATtAGCGGTCAGGAtGCtTTaCcCaATATCAGCGATGCCGAACGTATTTTTGCCGAACTt

CTGACgGGACTCGCCGCcGCCCAGcCGGGATTTCCGCTGGCACAAtTgAAAAcTTTCGTCGACCAgGAATTTGC

CCAAATAAAACATGTcCtGCATGGCatCAGTTTGTTGGGGCAGTGCCCGGaTAGCATcAACGCTGCGCTGATTT

GcCGTGgCGAGAAAaTGTcGaTcgCCattaTGGCCGGCGTGTTAGAAGCGCGTGGTCACAACGTTACCGTTATC

GATCCGgTCGAAaAAcTGCTgGCAGTGGGTCATTAcCtCgAaTCTACCGTTGATaTtGCTGAATCCACCCGCCG

TATTGCGGCAAGCCGCATTCCgGCTGACCACATgGtGCTGATGGCTGGTTTCACTGcCggTAATGAAAAAGgCG

aGCTGGtGGTtCTGGGAcGCAACGGTTCCGACTaCTCCGCTGCGGTgCTGGCGGCcTGTTTaCGCGCCGATTGT

TGcGAgaTCTGGACGGATGTTGAcGGTGTTTATACCTGCGATCCGCGTCAGGTGCCCGATGCGAGGTTGTTGAA

GTCGATGTCCTATCAGgAaGCGATGGAGCTTTCTTACTTCGGCGCTAAAgTTCTTCaCCCcCGCACCATTACCC

CCATcGCCCAGtTCCAGATcCCTtgCCtGATTAAAAATAcCGgAAAtCCCCAAGCACCAGgTACGCtCATTGGT

GCCAGCCGTGATGAAGACGAATTACCGGTCAAGGGCATTTCCAATcTGAATaACATGGCAATgTTCAGcGTTTC

CGgCCCGGGGAtGAAAGGgATggTTgGCATGGCGGCGCGcgTCTTTGCAGcGaTGTCACGCGCCCGTaTTtCCG

TGGTgCtGATTACGCAATCATCTTCCGAATACAGTATCAGTTTCTGCGTTCCGCaAAGCGACTGTGTGCGAGCT

gAaCGGGCAaTGcAGGAAGAGtTCTACCTGGAaCTGaAAGAAGGCTTACTGGAGCcGTTGGCgGtGACGGAACG

GCTGGCCATTATCTcGGTGgTAGGTGATGGTATGCGcACCTtaCGTGGGAtCTCGgCGAAATtCTtTGCCGCGC

TgGCcCGCGCCAATATCAACATTGTCgCCATTGCtCaGGGaTCTTcTGAaCGCTCAAtCTCTGTcGTGGTcAaT

AACGATgATGCGACCACTGGCGTGCGCGTTACTCATCAGATGCTGTTCAATACCGATCAGGTTATCGAAGTGTT

TGTGATTGgCGTCGGTGGCGTTGgcGGTGCGCTGCTGgAGCAACTGAAGCGTCAgCAAAGCTGGTTGAAGAATA

AaCATATCGaCTTACGTGTCTGCGGTGTTGCTAACTCGAAGgCACtgCTCACCAATGTACATGGCCTTAATCTG

GAAAACTGGCAGgAAGAACTGGCGCAAGCcAAAGAGCCGTTTAATCTCGgGCGcTtAATTCGCCTCGTGAAAGA

ATATCATCTGCtGAaCCCGGTCATTgTTGACTgTACTTCCAgCCAGGCTGTgGCAGaTCAATATgCCGACTtCC

TgCGCGAAGGTTTCCAcGTTGTtACGCCGAaCAAAaAGGCCaACACCTCGTcgATGGaTTACTaCCATCAGTtG

CGTTATGCGGCGGAAAAATCGCGGCGTAaATTCCTCtATGACACcaACGTtGGGGCTGGATTACCGGTTATTgA

GAACCTGCAAAATCTGCTCAATGCtGGTGATGAATTGATGAAGTTCTCCGGCATTCTTTCAGGTTCGCTTTCTT

AtATCTTCGGCAAGTTAGACGAAGGCaTGAGTtTCTCCGAGgCGACCaCACTGGCGCGGGAAATGgGTTATACC

GAACCGGAcCcGCGAGATGATCTTtCtGGTATGgAtGTGGCGCgTAagCTAtTGATtCTCGCTCGTGAAACGGG

ACGTGAACTGGAGCtGGCGGATATTGAAATTGAACCTgTGCTGCCCGCaGaGTTTAACGCCGAGGGTGATGTCG

CcGCTTTTATGGCGAATCTGTCACAGCTCGACGaTCtCTTTGCCGCGCGTGTgGCGAAGGCCCGTGATGAAGGA

AAAGTTTTGCGCTATGTTGGCAATAttGATGAAGATgGCgTCTGCCGCGTGAAGaTTGCCGAAGTGGATGgTAA

TGaTCCGCTGTTCAAAGTGAaAaATGGCGaAAACGCCCTGGCCTTCTATAGCCACTATtATCAGCCGCTGCCGT

TGGTACTGCGCGGATATGGTGCGGGCaATgACGTTaCAGCTGCCGGTgTCTTTGCTGATCTGCTACGtACCCTc

TCAtGGaAGTTAGGAGTCTGA

homoserine kinase

ATgGTTAAAgTTTAtGCCCCGGCtTCCAGTGCCaATATGaGcGTCGgGTTTGATGTGCTCGGGgCGGCGGTGAC

ACCTGTTGATGGTGCATTGCTCGgAGaTGTagTcaCGGTTGAGGCGGCAGAGACaTTCAgTCTCAACAACCTCG

GACGCTTTGCCGAtAAGCTGCCGTCAGAGCCACGgGaaAATAtCGTTtATcAGTGcTGGGAGCGTtTTTGcCaG

GAGCTTGGCAAGCAAATTCCAGTGGCGATGaCTCTGGAAAAGAATatGCCGAtCgGTTCGGGcTTAGGCTcCAG

CGCCtGTTCAGTGGTCGCGGCgCTgAtGGCGATgAATGAAcACTGCGGCaAGCCGCTTAATGACACTCGTTTGC

TGGCTTtGATGGgCGAgTTGGAAGGGcGTATCTCCGGCAGCAtTCATTACGACAACGtGGCACCGTGtTtTCtT

GGTGGTAtGCAGTtgATGATCGAAGAaAACGACATCATCAGCCAGCAaGTGCCAGGGTTTGATGAGtGGCTGTG

GGTGCTGGCGTATcCGGgGAtTAAAGTCtCGaCGGcAGAAGCCAGGGCTaTTTTACCGGCGCAGTATCGCCGCC

AGGATTGCATTGCGCAcGGGCgACATCTgGCAGGCTTCATTCACGCCTGCTATTCCCGTCAGCTTGAGCTTGCC

GCGAAGCTGATgAAAGaTGTTATCGCTGAACCCTACcGTGaACgGTTaCTGCCAGGCTTCCGGCAGGCGCGGcA

GgCGGTTGCGGAAATCGGCGCGGTAgCGAGCGGTATCTCCGGCTCCGGCCCGAcTtTGTTCGCTCTGTGtGAcA

AGCCGGATACCGCCCAGCGCGTTGCCGACTGgTTGGGTAAGAACtAcCTGCAAAATCAGgAAGGTTTTGTTcAT

ATTTGCCGGCTGGATACGGCGGGcGCACGAgTACTGGAAAACTAA

threonine synthase

ATGAAACTCtacaATCTGAAAGATCACAATGAGCAGgTCaGCTTTGCGCAAGCCGTAACCCAGgGgTTAGGCAA

AAATCAGGGgCtGTtTTTTCcgCACgaCCTGCCGGaaTTCAGCcTgACTGAAaTTGATGAGATgCTGAAGCtGG

ATTTTGTCACcCGCAGTGCGAAGATCCTcTCgGCGTTTATTGGTGATGAAATCCCGCAGGAAaTCCTGGAAGAG

CGCGTACGTGCGGCGTTTGCCTTCCCGGCTCCGGTCGCCAATGTTGAAaGCGATGTCGGTtGTCTGGAaTTGTT

CcACGGGCcAACGCTGGCaTTTAAAGATTTCGGcGGTcGCTTTATGGCACAAATGCTgACCcATATTGCGGGCG

ATAAGCCAGTGAcCATTCTGACCGCGACATCCGGTgATACTGGaGCGGCAGTGGcTCATGcTTTCtACGGTtTA

CCGAATGTGAAAGTGGTTATCCTCTATCCACGAGGCAAAATCAGTCCACTGCAAGAAAAACTgTTCTGTACATT

GgGCggCAATATCGaAACTGTTGCCATCGAcggCGaTTTCGATGCCTGTCAGGCGCTGGTgAAGCAGGCgTTTG

ATGATGAAGAACTGAAAGTGgCgCtGGGGCtGAATTCTGCTAAcTCCATCAACaTCAGTCGCTTGCTGGCGcAG

ATTTGTTaTTAcTTTGaGGCTGTCGCACAGTtGCCGCAAGAAGCACGTAACCAGTTGgTTGTCTCGGTaCCGAG

TGgAAACtTcGGCGATtTGACGGcGGGTCTGCTGGCGAaGTcACTCGGTCtGCCGGTAAAACGTtTTATTGCtg

CGACCAACGTGAACGAtACCGTACCACGTTTCCTGCaCGaCGGTCAGTGGTCAcCCAAaGCGACTCAGgCGAcg

TtaTCCAATGCGATGGATGTTAGCCAGCcAAaCAACTGGCCGCGTGTGGAAGAGTTGtTCcGCCGCAAAATCTG

GCAACTGAAAGAGCTGGgTTATGCAGCCGTGgATGATGAAACCACGCAACAGACAATGcGTGAGtTAAaAGAAC

TGGGCTATACCTCGgAGCCGCACgCTGCCGTAGCTTATCGTGCGCTGCGTGACCAgTTGAAtCCAGGCGAATAT

GGCTTGTtCCTCGGcACcGCGCATCcGGcGAAatTtAAAgAGAGCGTGGAAGCGATTCTCGGTGAAAcGTTGGa

tCTGCCAAAAGAGCTGGCAGAACGTGCTgATTTACCCTTGCTTTCGCATAACCTGCCCGCCGATTTTGCTGCGT

TGCGTAAatTgaTGATGAaTCATCAGTAA

hypothetical protein

AtGCAGCCcGGCTtTTTTTATGAAGAAAATaTGGAGaAaAACGACagGGAAAAAGGAGAAATTCtCAATAAATG

CGGtAACTTAGAgATTaGGATTGCGGAGAATaACAACTGCcGTTCTCaTCGCGTAATCTCCGGATATCGACCCa

TAACGGgCAATGATAAAAGgAGTAACCTGTGA

Non-protein region

aAAAACTgCTGGAAACAATGAAAGAcGTACCGGACGACCAAcGTCAGgCGC

transaldolase B

ATGACGGACAAATTGaCCTCcCTTCGTCAGTACACCACCGTAgTGGCCGACACTGGGGACATCGCGGCAATGAA

GcTGTaTCAACcGCAGGATGCCACAACCAAcCCTtCTCTCATTCTTAACGCAGCGCAGATTCcGGAATACCGTA

AgTTgATTGaTGATGCTGTCGCCTGGGcGAaACaGCAGAGCAAcGATcGCgCgCAGCAgATCGtGGACGCGACC

GAcAAACTGGCAGTAaATATTgGTCTgGAAaTCCTGAAACTGgTTCCGgGCCgTATCTCAActGAAGTtGATGC

GCGTCTTTCCTATGACaCCGAAGCGTCAATTGCGAAAGCAAAACGCCTGATCAAACTCTACAACGATGcAGGTa

TTAGCAACGATCgTaTTCTGATCAAACTGGCTTCTACCTGGCAGGGTATCCGTGCTGcAGAACAGCTGGAAAAA

GAaGGTATTAACTGTAAcCTGACCCTGCTgtTCTCctTCGCtCAGGcTCGTGCTTGTGCGGaAGCGGgCGTgTT

CCTGaTCTCGcCGTTTgTTGGCcGTATTCTTGACTGGTAcAAaGCGAATACCGaTAAGAAAGAGtACGCTCcGG

CAGAAGATcCGGGCGTGGTTTCTGTatCtGAAATCtACCAGtACTACaAAGAGCATGGTTaTgAAACCGTGGTT

ATGGGCGCAAGCTTCCGTAACATCGGCGAAATTCTGGAAcTGGCAGGCTGCGACCGTCTGACCatCGCACCGgc

ACTGCTGAAAGAGCTGgCGGAGAGCGAAGGGGCTATCgAACGTAAACTgTCTTACAcTGgTGAAGTgAAAGCgC

GTCCGGcGCGTATCACtGAGtCCGAGTTCCTgTGgCAgCACAACCAGGATCCAATGGCAGTaGATAAACTgGcG

GaAGgTATCCGTAAGTTTGCTGTTGACCAGGAAAAACTGGAAAAAATGATCGGCGATCTGCtGTAA

molybdopterin biosynthesis mog protein

ATGAATACTTTACGTATTGGCTTaGTtTcCaTCTCTGATCGCGCATCCAGCGGCGTTTAtCAGgaTAAAgGCAT

CCCTGCGCTGGAagAATGGCTGACAtcGGCGCTAACCACGcCGTTTGAaCTGGAAAcCCgCTTaATCCCCGATG

AGCAGGCGATCATCGAGCAaACgTTgTGTGAGCTGGTGGATGAAaTGAGtTGCCaTCTGGTGCTCACCACGGGC

GGAAcTGGCCCTGCGCGTCGTGAcgTAACGCcCGATGcGACGCTGGCAGTAGCGGACCGCGAGATgCcAGGCTT

TGGTGAACAGATGCGCCAGATCAGCCTGCATTTTGTACcaaCTGCGATCCTTTCGCGTCAGGTggGGGTgATTC

GCAAACAGGCGCTGATCCTTAACTTaCcCGGTCAACCGAAGtCTATTAAAGAGACGCtGgAAGGTGtGAAGGAC

GCTGAGgGTAAcGTTGTGGTGCACGgTATTTTTGCCaGCGTaCcGTaCTGCATTCAGTTGCTGGAAGGGCCATA

CGTTGAaACGGCaCCgGaAGTGGTTGCAGCATTCAGaCCGAAGAGTGCAaGACGCGAAGtTAGCGAATAA

chaperone protein DnaK

aTGGGTAAAATAaTTGGTATCGACcTGGGTACtACCAaCTCTTGTGTagCGaTTAtGGATGGCACCACTCCtCG

TGtACTGgAGAACGcCGAAGGCGATCGCACCAcGcCTTcTATCATTgCCTATACCCAGGAtGGTGAAACTCTGG

TTgGTCAGCCGGCTAAACGTCAGGCAgtGACGAACCCgCAaAACAcCCTGTtTGCGATTAAACGCCtGATTGGC

CGCCgCTTCCAGgACgAAGAAGTACAGCGtGATgTTTcCATCATGCCGTTCAAAATTAtTGcTGCtgatAACGG

CGACGcATGGGTCGAAGtTAAAgGCCAGAAAATGGCAcCGCCGcAGAtCTCTGCTGAAGTGCTGAAAAAAAtGA

AGAAAACCGCTGAAGaTTAcCTGGgTGAAcCGGTAACTGaAGCTgtTATTACCGTACCGGCAtACTttaACGAT

GCTCAGCGTCAGGcAACCAAAGaCGCAGGCCGTATCGCTGGTCTGGAAGTAAAaCGTATCATCAACGAaCCGAC

CGCAGCTGCGCTGGCTtACGGtCTGGACAAAGgTACTGGCAACCgtACTATCGCGGTTTATGACCTGGGTGGTG

GTACTTTCGATATTTcCATTATCGAaATCGACGAAGTTGACGGCgAAAAAACCttCGAAGTTCTGGCAACCAAC

GGTGATACCCACCTGgGTGGtgAAGACTTCGACAGTCGTCTGATCAACTAtCTGGTTGAaGAATTCAAgAAAGA

TCAGGGCATTGacCtGCGCAACGaTcCGCTGGCAATGCAGCGCCTGAAaGAAGCGGCAGAAAAAGCgAAAATCG

AACTGTctTCCGCTCAGcAGACCGaCGTTAACcTGCCGTACATCACTGCAGACGCGAcCGGTCCGAAACACAtG

AACATCaAAgTGactCGTGCGAAACTGGAAAGCCTgGtTGAAGAtCTGGTAAACCGtTcCATTGAGCCGCTGAA

AGTTGCACTGCAGGACGCTGGCCTGTCCGTATCTGATAtCGACgaCGTTATTCTCGTTGGTGGTCAGACTCGTA

TGCcAATGGtTCAGAAGAAAGTTGCTGaATTCTTTGGTAAAgAGCcGCGTAAAGATGTTAACCCGGACGAAGCT

GTaGCCATCGgTGCTGCTGTTCAGGGTGGTGTTCTGACTGGtGAcGTAAAAGaCGTacTGCTgCtGGACGTTAC

CCCGCTGTCtCTGGGTATcGaAACCaTGGGCGGTGTGATGACCACGCTGATCGCgAaAAACACCACTATCCCGA

CCAaGcAcaGCCAGGTGTTCTCTACCGCTGAAGACAACCAGTCTGCGGTAACCATcCATgtGCTGcAGGGTGAA

CgTAaACGTGCgGCTGAtAAcaAATCTCTgggTCAGTTcAACCTGGATGGTATCAaCCCGGCACCGcGCGGCAt

gCCGcAGATCGAAGtTACCtTCGAtATCGaTGCTGACGGTATCCTGCaCGTTTCCGCGAAAGACAAAAACAGCG

GTAAAGAGCAGAAGATCAcTATCAaGGCTTCTTCTGGtCTGAaCGAAGAtGAAATCCAGAAAATGGTACGCGaC

GCAGAAGCTAAcGCCGAAGCTGACCGTAaGTTTGAAGAGCTGGTACAGACtcGCaACCAGGGCGACCATCTGCT

GCACAgCACCCGTAAGCAgGTTGAAGAAGCAGGCGACAaACTGCCGGCTGACGACAAAACTGCTATCGAGTCTG

CGCTGActGCACTgGAAACtGCTCTGAAaGGTGAAGaCAAAGcCgCTATcGAAGCGAAAATGCAGGAACTGGCA

CAGGTTTCCCAGAAACTGATGGAAATCGCCCaGCAGCAACATGCcCAGCAGCAGACTGCCGGTGCTgATgCTTC

tGCAAaCAAcGCGAAAGaTGACGATGTTGTCGACGCtGAATTTGAAGAAGTCAAAGACAAAAAATAA

chaperone protein DnaJ

GTGCatTCatCTAGGGGcAATTTAAAAAAGATGGCTAAGCAAGATTaTTACGAGaTTTTAGGCGTTTCCAAAaC

AGCGGAAGAGCGtGAaaTCAAAAaGGCCTACAAACGCCTGGCCATGAAaTACCaCCCGGaCcGTAACCAGGgTG

ACAAAGaGGCCGAGGCGAAATTTAAAGAGATCAAGGaAGCTTATGAAGTTCTGACCGACtCGCAAAAACgTGCg

GCATaCGATCAGTaTGGTCATGCTGCGTTTGAGCAAGGTGGCATGGGCGGCGGcGGtTTTGGCGGCGGCgCAGA

CTTcAGCGATAtTTtTGGTGACGtTTTCGGCgATATTTTTGGcGGCGGACGTGGTCGTCAACGTGCGGCGCGCG

GTGCTGATTTAcGCTATAACATGGAGctCACcCtCGAAgAAGCTGTACGtgGCGtGaCCAAAGaGATccGCATt

CCGACTCtGGAAGAGTGTGACGTTTGCCACgGTAGCgGTGCAAAACCaGGTACACAgCCgCAGACCTGTCCGAC

cTgTcATGGTTCTGGCCAGGtGCAGATGcGCCAGGGTTTCTTTGcCGTGCAGCAGACCTgTCcAcACTGTCAGG

GCCGCGGTACGCTGaTcAAAGATCCGTGCAACAAATGTCATGGTCATGGTCGTGtTGAGCgCaGCAAAACGCTG

TCCGTTAAAATCCCGGCaGGGgTGGACACTGGAGaCCGCATCCGTCTTGCGgGCGAAGGTGAAGCGGGTGAACA

CGgCGCACCGGCAGGCGATCTgTACGTTCAGGTtCAGGTtAAACaGCACCCGATTTTCGAGCGTGAAGGCAACA

ACCTGTATTGcGAAGTcCCGATCAAcTTCGCTATGgCGGCGcTGGGTGGTgaAATCGAAGTACcGACCcTTGAT

GGTcGcGTCaaACTGAAAGTGCCTGGCGAAACCCAGACCGGTAAgCTGtTCCgTaTGCGCGGTAAAGGCGTCAA

GTCtGTcCGCGGTGGcgCACAGGGTGATTtGCTATGCCGCGTTGTTGTCgaAACAcCGGTAGGTTTGAACgAGA

AGCAGAAACAGCTGCTGCAAGaGctGCAAGAAAGCtTTGGTGGcCCAACCGGCGAGCACAACAGCCCGCGTTCA

AAGAGCtTCTTtGATGGCGTGAaGAAGTTTTTTGACGaCCTgACTCGCTAA

hypothetical protein

TTGCTCTTaCTCGGATTCgTAAGCCGTGAAAACAGCAaCCTCCGtCTGGCCAGTTCGGATGTGAACCTCACAGA

GgTCTTTTCTCGTTACCAgCGCCGCCACTACGGCGGTgATACAGATGACGATCAGgGcgACaAtcAtCgCcTTA

TGCTGCTTCATTGCTCtCTtCTCCTTGACCTTTCGGTCaGTAAGAgGCACTCTACATGTGTTCTGCATATAGgG

GGCCTCGgGTtGATGgTAAAATAtCACTCGGGGCTTTTCTCTAtCTGCCGTTCAGCTAATgCcTGA

hypothetical protein

aTGTCTGCCAAaaGACGACTTCTTATTGCGtGTACCTTGAtAaCAGCTATcTATCAtTTTCCTGcaTATTCTTC

ATTAgAATATAAAGGAtCCTTTGGTTCAATaAATGCGGGTTAtGCAGACTGGAATAGTGGaTTTgTAAaCACTC

ACCGTGGTGAaGTATGGAAAGTGACtGCGGATTTTGGGgTaAATTTTAAAGAAGCAGAATTTTACTCAtTTTAT

gAaAGTAATGTACTCAATCATGCTGTAGCAGGGAGAAATCATACgGtTTCAGCAATGaCGCATGTCAGACTCtT

TGaCtCTGATaTGACATTCTTTGGCAAAATTTaTGgCCAATGGGATAACTCATgGggTGAcGATCTgGACATGT

TTTATGGATTCGGTTACCTCGGCTGGAACGGCgAgTGGgGCTTTTtTAAACCGTATATTGGATtGCATAATCAA

TCTGGTGACTACGTATCAGCTAAATaTgGTCAAACGAATgGTTgGAATGGtTATGTTGTTGGCTGGACAGCAgT

ATTAcCATTTAcGTTATTTGACGAAAAATTTGTTTTATCTAACTGGAATGaAATAGAACTGGACAGGaACGATG

CTTACACGgAgCAGcAATTTGGCcGGAACGGgTTaAaTGGCGGtTTAACTATTGcCTGGAAGTTCTATCCTCGC

TGGAAAGCCAGtGTGACGTGGCGTTATTTcGATAAtAaGCTGGGCTACGATGGCTTTgGcgaTCAAATGATTTA

tATGCTTGgTTATGATTTCtAA

putative secreted sulfatase

ATGCAGAAAACGTTAATGGCCAGTTTGATCGGCCTTGCAGTTTGCACAGGGAAtGCTTTTAGtCCTGCCTTAGC

CGCAGAGGCTaAACAACcTAATTTAGTCATtaTTATGGCGGaTGATtTAGGTtaTGGCGAtTTAGcAaCaTATG

GTCATCAGATCGTTAAAACACctAATATCGACAGGCtTGCCCAgGAAGGGGTCaAATTtACTGAcTaCTATGCC

CCCGCTCCTTtAaGTTCAccTtCACGCGCaGGGCTATTAACCGGCcGGATGCCATTtCGTAcTGGAATTCGCTC

ATGGATtCCttCAGGCAAAGATGTTGCCtTAGGGCGTAACGAAcTCACgATTGCTAaTCTACTCAaAgCGCAaG

GGTACGACACggCAATGATGGGTAAGCTGCATCTGAATgCAGGcGGCGaTCGCACCGATCAgCCaCAAGCACaA

gATATGGGcTTTGATTAcTCAcTGGTtAATACgGCGGGCTTTGTTACcGACGCCACGCTGGATAAcGCTAAAGA

ACGCCcGCGTTATGGCATGGTTtAccCGACAGGCtgGCtACGTAACGGGCAACCCACTcCACGaGCTGATAAAA

tGAGCGGTGAGTATGTCaGTTCGGAAGTCGTCAACTGGCTGGATAACAAAaaGGACaGCAAGCCTTTCTTCCTC

TATgTTGCTTTTACCGAAGTGCATAGCCCCCTGGCTTCGCCCAAAaaATACCTCGATaTGTaCTCACaATATAT

GAGCGCGTATCAGAAGCAGcATCCTGATTTAtTTTaTGGCGACTGGGcAgACAAACCCTgGCGTgGTGTGGGgG

AATATTAtGCCAATATCAGCTATCtGGATGCAcAGGTTGGAAAAgTGCTGGaTAAAATCAAAGCTGTGGgtGaA

GaaGaTAACACAATCGTTATTTTTACCAGTGatAACGGTCCgGTAaCGCGTGAAGCGCGCAAAGTGTATgAGCT

GAATTTGGCAGGGGAaACGGaTGGATTACGCGGTCGCAAGGATAACCTTTGGGAAGGCGGAATTCGtGTTCCaG

CCATTATTAAATaTGGTAAACATCTACCACAGGGAATGGTTTCAGATACACCCGTTTATGGtCTgGACTGGATG

CCTACtTTaGCgAaAATGATGAACTTCAAATTACCTACAGAcCGTAcTTTCGATGgTGAATCGCTGGTTCCTGt

TcTTGAGCaAAAAGCATTGAAACGCGAAAAGCCATTAATTTTCGGGATTGATATGCCATTCCAGGATgATCCAA

cCGATGAATGGGCGATCCGTGATGgTGACTGGAAgAtGATTATCGATCGcaATAATAAACcGAAATATCTCTAC

AATCTGAAATCTGATCGTTATGAAaCaCTTaAtCTGATCGGTAAAAAAACAgATATTGAAAAACAGATGTATGG

TaAGtTTtTAAAATATAAAACTGATATTGATaATGATtCTCTAATGAAAgCCAGAGGTGATAAACCAGAAGCGG

TGACCTggGGCTAa

putative cytoplasmic protein

ATGTTTACcAacGTAAATGTTGATTGtTgCAAAACACCAGGAtGTAAaaACCTGGGGTTGCTGAATAGCCAGGA

TTATGTCGCAcAGgGTaAaAATATTTtATGCCGTGAATGTgGTTaCTTGTtTCCAGtGATATCTGAACAGTCGC

TTAAtATTTaTCGTAATATTGTGAAtcACTcCTGGAGAGGTTTGATTTGCCAATGTTCAACTtGCGGAGGcACG

TCCCTCAAAAAATaTGgATATtCtGCAcAagGCCAgAGAAGAATgTATTGCcaTCAtTGTGaGAAAACaTTtAT

CACTCTGGAAcAtGTAATTACcACACCACGAGGAGCcCTGTTAGcATTGATGATTGAGCAAGGGGAGGCACTTG

CGGaTATCAgAAAGTCATTACGTCTTAACAgCACTGGACTTAGCCGTGAACTGTTAAAATTAGCGCGTGAAGcA

AACTATAAAGAAAGTCGACAGTGTTTCCCTGCTTCTGATATTACCCTGAGtACCCGCGCTTtTCGcgTCAAGTA

tAATGGTAGCAATAACTCTCTTTATGCTCTTGTTACCGCAGAAGAACAAAGcGGCAGGGTgGTTGcCaTCTCAA

CCAATTACTCCCCATCtGCCGTAGagCaaCATTATcAATACaCATCGAACtATGAAGAGcGTATGTCTCCAGGG

ACGCTGGCACAtCATGTCCAGCGCAAAGAGttACTTACTATGCGGCgGGATACCTTGTTTGATATTGATTACGG

CcCGgCAGTTTTACATCAAAACGATCCGGGAATGtTGGTAaAaCCGGTTCTTCCGGCATaTCGTCATTTTgAAC

TGGTCAGAATACTGACCGATGAGCATtCCAACAACGTTCAGCATTACCTTGATCACGAATGCTTTATaTTGGGC

GGCTGcCTGATGGCTAATTTGCAGCaTATTCATCAaGGTCGCTGCCATATTTCcTTTGTCAAaGAGCGcGGTGT

GGCACCCGCCACCATTGaTTTTCCACCGCGATtATTCcTTAGTgGtGGgGTACgAAATAATGTCTGGCGTGCaT

TTTCTAACCGCAATTATTCAaTGGCTGTATGCAAtCTCaCTGGCAGTAAGAAAGTCCGCGAGATGCGGCATGCA

ACATtGAACAGTGCGACGCgTTtTATCCACTTTGTGgaGAACCATCCTTTCCTTATaTCATTGAACCGAATgtC

TCCTGCGaaTGTCgtTTCTACaTTAGATaTCCTCAAACaTCTGTGGAATAaAaAACTAGagCATGGAACAATTt

AA

sodium/proton antiporter 1

GTGAAACATCTGcATCGATTCTTTAGCaGTGATGCCTCGGGAGgCATTATTCTCATTATTGCCGCTGTATTAGC

GATGATTATGGCCAACAGCGGTgcAaCCAGTGGATGGTATCACGACTTTCTTGAGACGCcGGTTCAGcTcCGGG

TTGGGACACTTGAGATCAACAAGAACATGCTGCTATGGATCAATGaCGCTCTGaTgGCGGTATTTTTCCTGTtG

GTTGGTcTGGaAGTTAAAcGCGAGcTGaTGCAaGGTTCGCTGGCCAGTCtGCgCCAGGCGGCatTTCCTGTTAT

TGCCGcAATCGGCGGGATGATTGTCCCGGCATTGCTCTATCTGGCTtTtAACTATGCCGATCCGaTTaCCCGCG

AAGGcTGGGCAatCCCGGCGGCGACTGacATTGCCTTTGCACTTggTgTGTTGGCGCTgTTGGGAAGTCGTGTT

CCGTTAGCGCtGAAGATCTTTTtGATGGCTCTGGCtATTATCGACGATCTTgGGGcCATCATtATCATCGCATT

GTTCTACAcTAATGACTTATCGATGGCCTcTCTTGGCGTcGCgGCTGTAGCAATTGCGgtACTCGCGGTATTGA

AtCTGTgTGGTGTAcGCCGCACGGGCGTtTATATTCTGGTTGGCGTGGTGCtGTGGaCAGCGGTGTTGAAATCG

GGGGTTCACGCAACCcTGGCTGGCGtCATtGtCGGCTTCTTTATTCCTTTGAAAGAGAAGCATGGgCGCTCTCc

GgCTAAACGTCTGGAGCATGTTTTGCAtCCATGGGTGGCGTATCTGATtTTGCCGCTGTTTGCATTTGCTAATG

CTGGCGTTTCACTGCAaGGTgTCACGCtggAaGGTTTgACCtCCATTCTGCCATTAGgGATCATCGCTGGTTTG

CTGaTTGGCaAGCCACtGGGTAtTaGTCTgttcTGCTGGtTGGcgCTGCGTTTGAAATTGGCACATCTGCCAGA

GGGAACgACTtACCAGCAAATTATGGCGGtTGGTaTCcTGTGCGgTATCgGTTtTAcTatGTCTATCTTTATTG

CCAGCCTGGcATTTGGTAgCGTAGATcCAGAaCTGaTTAACtGGGCAAAAtTAgGTATCCTTGTCGGTTCAATT

TCtTcGgCGGTAATTGGATATAGcTGGTTACGcGTTCGTTTACGTCcATcAGTTTGA

transcriptional activator protein NhaR

ATGAGCATGTCTCATaTCAATTACAACCACTtGTATTACTTCTGGCaTGTCTAcAAAgAaGGTTCTGtGGTTGG

CgCAGCGGAGGCGCTTTATTTAACAcCAcAAACCATTACCGGGCaGATCCGGGCGCTGGAaGAGCGCCTGCAAG

GGAAAcTATTTAAGCGTAAAGGAcgTGGTCTGGAACCCAgcGAACTGGGGGAACTGGTCTATCGCtATGCCGAT

AAAATGTTCAcCTTAAgCCAGGAAATGCTgGATATCGTCAACTATCGCAAAGAGTCCAACTtATTGtTTGATGT

TgGTGTGGCAGATGCACTTtcCAAAcGtcTGGTCAGCAGTGTTCtgGATgCCGCAGTtgTGGAAGACGAGCAGA

tCCATCTACGCTGTTTCGAaTCGACGCACGAGATgCTTTTaGAGCAgtTGAGTCAGCATAaACTGGATATGATc

aTCTCTGACTGTCCGaTCGATTCCACTCAGCAGGAAGGGCTGTTTTCCATGAAAaTtGGCGAATGTGGTGTCAg

tTTCTGGTgCACTAACCCACTACcAGAAAAGCCGTTTCCTGCCtGTCTTGAAgAGCgTCGtTtACTTATTCCGG

GGCGTCGCTCAaTgTTGGGGCGtAAACTATTAAACTGGTTTAACTCcCAGGGCTTGAACGTCGAAATTTTGgGT

GAGTTTGATGATGCTGCGTTGATGAAAgCCTTTGGGGCGAcGCATAACGcTATTTTCGTTGCACCTTCGCtTTA

CGCTAATgATTTCTATAACgATGACTCGgTtGTGgAGATAGgCCGTGTTGAGaACGTGATGGAAGAGTACCACG

CGATTTtTGCCGaAAGgaTGAtTCAgCACCCTGcAGTAcAGCGTATCTGcAATACAgacTATTCTGCGCtgTTT

ACTCCAGCTTcAAAATAA

riboflavin kinase

ATGAAGCTGATACGCGgCAtACATAATCTCAGCCAGGCCCCGCAAGAAGGGTGTGTGCTGACTATTGGTaATTT

CGACGGCGTGCATCGCggTCATCGCGCGCTGTTACAGGGCtTGCAGGAAGAAGGGCGCAAGCGCAACtTACCGG

TGATGGTGATGCTTTTtGaACCTCAACCAcTGGAACTGTTTGCTACTGAtAAAGCcCCGGCACGGcTcACcCGG

CTGCgGGAAAAACTGCgTtaTcTTgCAGAgTGTGGCGTTGATTACGTGCTGTGCGtGCGTtTTGaCaGGCGTtT

TGCGGCGTTAACCGCGcAAAACTTCATCAgTGATCTtCTGGTGAAGCACTTGCGGGTAAAATTTCTTGCCGTAG

GTGACGAtTTCCGCTTTggCGCTGgTCGTGAAgGCGAtTTCTtGTTATTACAGAaAGcgGGCATGGAATACGGC

TTCGATATcACCAGCaCGCAAAcTTtTTGCGAAGGTGGTGTGCGtATCAGcAGCACCGCCGtgCGTCAGGCGCt

TGCGgATgACAATCTGGCTCTGGCAGAAAGTTTACTGGgGCACCCGTTTGCTATCTCCGGGCGTGTAGTCCACG

GTGATGaATTAGGGCGCAcTATAGGTTTCCCgACGGCGaATGTACCGcTaCgCCGTCAGGTTTCCCCGGTGAAA

gGGGTTTATGCGGTAGaAgTgTTGGgCCtTGgCGAAaAGcCGTTAcCCGGcgTTGCAAACaTCGGAACACgCCC

AACGGTTGCcGGTATTCGCCAGCAACTGgaAGTGCATTTGTTAGATGTTGcAATGGaCCTTTATGGTCGCCAtA

TACAAGTAGTGCTGCGtAAAAaAATAcGCAATGAGCAgCGATTTGcATCGCTGGACGAACTGAAAGCGCAGATT

GCGCGTGATGAATTAACCGcCCGCGaaTTTtTTGGGCTAAcAAAACCGGCTTAa

Isoleucyl-tRNA synthetase

ATGAGTGACTATAAATCaACCCTgAATTTGCCgGAAACAGgGTtCCCGATgCGTGGCGATCTCGcCAAGCGCGA

AcCGGGaATGCTGGCGCGTTGGACTGATGATGATCTgTaCGGCATCATCCGTGCGGCTaAAAAAGGCAaAaAAA

CCTTCAtTCTGCATgATGGCCcTCCTTATGCGAATGGCAGCAtTCaTATTGGTcACTCGGTTAACAAGATTCTG

AAAGACATTaTCATTaAgTCCAAAgGGCTttCTGGATATGACTCGCCGTATGTGCCTGGCTGGGACTGTCaTGG

tCTGCCAATCGAAcTGAAAGTAGAGCAAGAATACGGTAAGCCGGGgGAGaAaTTCACCGCCGcTGAGTtCCGCG

CCAAGTGCCGCGAATACGCTGCgACCCAGGTTGACGGTCAGCGCAAAGACTTTaTCcGTCTGGGCGTGCTGGGC

GActgGTCgcACCCGTACCTGACCATGGACtTCAAAACTGAAGCCAACATCATCCgCGCGCTGGGCAAAATCAT

CGGCAAcGGTCACCTGCACaAAGGcGCGAAGCCGGTgCACTGGTGCgTTGACTGCCGTTCTgCACTGGCAGAAG

CGGAAGtTgAGTATTACGacAAAACTtCTCCGTCCATCGACGTCGCTTtCCAGGCGGTCGATCaGGATGCGCTG

AAAACGAAATTTGGCGTAAGCAATgTTAACGGCCCAATTTCGCtGGTTATCTGGaCcACCACGcCGTGgAcGCT

GCcTGCTAacCGCgCAATCTCcATtGCACCTGATTTTGAttATGCGCTGGTGCaAatCgACGGTCAGgCCGTGA

TCCTCGCGAAAGATCtGGtTGaAAGCGTAAtGCAGCGTATCGGCGTTAGCGaTTACACCATTCTTGGCAcGGtg

AAAGGTGCCGAGCtGGAACTGTTgCGCTTTACCCATCCGTTtATGGACtTCGATGTTCCGGCAaTTCTCGGCGA

CcACGTTACgCTGGATGCCGGTACCGGTGcCGTTCATACCGCGCCAGGCcACGGTCCGGaCGACTATgTGATCG

GTcAAAAATaTGgTCTGGAAaCCGCTAACCCgGTTgGCcCGGACGgCACtTaTCTGCcGgGTACTTACCCGACT

CtGGATgGCGTTaACGTCTTCAAAGCGAACGaTATTGTCATTGCGTTGTTgCAGGAAAAAGGcgCACTGTTGCA

CGTTGAGAAAATGCAACACAGCTATCCGTgCTGCtGGCGTCaTAAaACGCCGATCAtCTTCCGcgCGACGCCGC

AGTGGTTCGTCAgCAtgGATCAGAAAGGTCTGCgTGCGcAGTCACTGAAAGAGATCAAAGGCgTGCAGTGGATC

CCTGACTGGGGCCAGGCGCGTATCGAGTCGATGGTTGCTAACCGTCCTGACTGGTGTATcTCTCGTCaGCGTAC

CTGGGGcGTGCCgATGTCACTGTTCGTgCaCAaaGACACAGAAGAaCTGcATCCGCgTACTCtcGAACTGaTGG

AAGAAGTGGcAAAACGCGTTgAAGTtGACgGCATTCAGGCgTGGTGGGATCTCGATGCGAAaGAgATCcTCGGC

GaCGAAGCTGACCAGTATGTGAAAGTACCGGATACGCtGgATGTATGGTtTGACTCCGGATCTACCCACTCTTC

CGTTGTTGATGTGCGTcCGGAATtTGCCGGTCACGCAGCGGACATGTaTcTGgAaGGTTCTGACCAACACcGTG

gCTGGTtCATGTCtTCCCTGATGATCTCTACCGCGATGAAGGGcAAAGcGCCATATCGTCAGGTACTGACTCAC

GGCTTTAcCGTGGATGGTCAGGGTCGCAAGATGTCTAAATCCATCGGtAACaCcGTTTCGCCGCAGGATGTgAT

GAATAAACtGGGtGCGGATATTCTGCGTCTGTGGGTGGcATCAACCGACTAcACTGGCGAAATGGCcGtTTCTG

ACGAGATCcTGAAACGtGCTGCcGACAGCTATCGTCGTATCcGTAACAcCgCGCGCTTCCTGCTGGCAAACCTG

AACgGTTtTGAtCCGGCaAAAGaTATGGTGAAACCGGAAGAGATGGTGGTaCTGGATCGCTGGGCCGtAGGTTG

TGCGAAAGCGGCACAGGAAGACATCCtCAAGGCgTACGAAGCATACGATTTCcACGAAGTGGTaCAGCGTcTGa

TGCGCtTCTGCTCCGTTGAGATGgGTTccTTCTACCTCGACATCATCAAAGACCGTCAgTATACcGCCAAAGCG

GaCAGCGTGGCGCGTCGTAGCTGCCAGAcTgCGCTGTATCACATCGCaGAAGCGCTGGTTCGCTGGATGGCAcC

AATCCTCTCCTTCaCcGCTGaTGAAGTGTGGGGtTaCCTGCCggGCGAACGTGAAAAATACGTCTTCAcCGGCg

AgTGgTACGAAGGCCTGtTTGGTCTGGCAGACAGTGAAGCAATGAACGaTGCGTTCTGGGACGAGCTGTTGAAA

GTGcGTGGCGAAGTGAAcAAAGTcaTTGAGCAAGCgCGTGCCGATAAGAACGTGGGcGGCTCGCTGGAAGCGGC

AGTAAcCTTGTATGCAGAACCGGAaCTGGCgGCGAaaCTGaCCGcGCTGGGCGAtGAATTACGATTTGTCCTGt

TGACCTCCGgCGCTAcCGTTGcAGACtATAACGACGCACCTGCTGATGCCCAGCAGaGCGAaGTcCTCAAAGGG

CTGAAAgtCGCGTTGAGTAAAGCCgAAGGtGaGAAGTGTCCtcGctGCTGgCACTACACCcAGgATGTcGgCAA

GGTGGCGGaACACGCAGAAATCTGCGGCCGCTGTgTcAgCaACGTCGCCGGTGACGGTGAAAAaCGTAAGTTTG

CCTGA

Non-protein region

GCTTGCGCCAACGcCATTTCATCGCCATCCCGCCgAgcATACAGGCCTCGgAaGAACCAaTGGTGTTGGTGcCA

ACGGCCtGAccATTTTTcGGTGCAGGCGCATGCCACAGATCGGCAACCATGTTTACGCAACGCAGATCGATTGC

TGcAGaTTGCGGATATTctTCTTTGTCGATCCAGTTTTTGTtAATGGAtAAAtCCA

FKBP-type 16 kDa peptidyl-prolyl cis-trans isomerase

ATGTCTGAATCTGTACAGaGCAaTAgCGCCGTCCTGGTGCACTTCACGCTAAAACTCGACGAtGGCaCCAcCGC

TGAGTCTACCCGCAaCAaCGGTAaACCGGCGCTGTTCCGCcTGgGTgATGCTTCTCTTTCTgAaGgGCTGGAGC

AACACCTGCTgGGGCTGAAAGTGGgCGATAAAACCaCCTTCtCGCTGGAGCCAGATGCGGCgTTtgGCGTGCCG

TcACCgGAcCTGATtCAGTAcTTCTCcCGCCGTGAATTTATGgATGCAGGCGAGCcaGAAATTGGCGCAATCAT

gCTTTTTACCGCAATGGaTGGCAGTGAGATGCCTGGCGTGaTCCGCgAAATTAACGGCGACTCCATTACCGTTG

ATTTCAACCaTCCGCTgGCCGGGCAGACCGTTCATTTTGATATTGaagTGCTGGaAATCGATCCGGCAcTGGAG

GcGTaA

Terribly sorry for the long question but thank you if you can help.

In this assignment you read an input file containing named sequences of nucleotides and produce information about them. For each nucleotide sequence, your program counts the occurences of each of the four mucleotides (A, C, G, and T). The program also computes the mass percentage occupied by each nucleotide type, rounded to one digit past the decimal point. Next the program reports the codons (trios of nucleotides) present in each sequence and predicts whether or not the sequence is a protein-coding gene. For us, a protein-coding gene is a string that matches all of the following constraints* * begins with a valid start coapn (ATG) . ends with a valid stop codon one of the following: TAA, TAG, or TGA) contains at least 5 total codons (including its initial start codon and final stop codon) Cytosine (C) and Guanine (G) combined account for at least 30% of its total mass These are approximations for our assignment, not exact constraints used in computational biology to identifv proteins) The DNA input data consists of line pairs. The first line has the name of the nucleotide sequence, and the second is the nucleotide sequence itself. Each character in a sequence of nucleotides will be A, C, G, T, or a dash character, "-". The nucleotides in the input can be either upper or lowercase. Input file dna.txt (partial): ure or canear protern AIGccACTATGGTAG captain picard hair growth protein ATgCCAACATGgATGCC-GATA GGATTgA bogus protein The dash --" characters represent "junk" or "garbage regions in the sequence. For most of the program they should be ignored in yor computations, though they do contribute to the total mass of the sequence as described later Program Behavior. Your program begins with an introduction and prompts for input and output file names. You may assume the user will type the name of an existing input file that is in the proper format. Your program reads the input file to process its nucleotide sequences and outputs the results into the given output file. Notice the nucleotide sequence is output in uppercase, and that the nucleotide counts and mass percentages are shown in A, C, G, T order. A given codon such as GAT might occur more than once in the same sequence Log of execution (user input underlined): program reports secruences that mav encodie prote2n5. Input file name ? dna. txt Output file name? output.txt Output file output.txt after above execution (partial) gion Name: cure Eor cancer protein Nucleotides: ATGCCACTATGGTAG Nuc. Counts: 4, 3.4. 41 Total Mass: [27.3, 16.8, 30.6, 25.31 of 1978.8 Codons List: CATG, CCA, CTA TGG, TAG] Is Protein? YES Region Name: captain picard hair growth protein Nucleotides: ATGCCAACATGCATGCCCGATATGGAITGA Nuc. Counts: [9. 6, 871 Total Mass 30.7, 16.8, 30.5, 22.1] of 3967.5 Codon L5t : [ATG, CCA, ACA, TGG, ATG, CCC, GAT, ATG, GAT, TGA] Is Protein? YES Region Name: bogus protein Nucleotidies: CCATT-2ATGATCA-CAGTT Nuc. Counts: [6, 4, 2. 61 Total Mass 32.3, 17.7. 12.1, 29.91 of 2508.1 Codons List: [CCA, TTA, ATG, ATC, aca, GIT Is Protein?: NO In this assignment you read an input file containing named sequences of nucleotides and produce information about them. For each nucleotide sequence, your program counts the occurences of each of the four mucleotides (A, C, G, and T). The program also computes the mass percentage occupied by each nucleotide type, rounded to one digit past the decimal point. Next the program reports the codons (trios of nucleotides) present in each sequence and predicts whether or not the sequence is a protein-coding gene. For us, a protein-coding gene is a string that matches all of the following constraints* * begins with a valid start coapn (ATG) . ends with a valid stop codon one of the following: TAA, TAG, or TGA) contains at least 5 total codons (including its initial start codon and final stop codon) Cytosine (C) and Guanine (G) combined account for at least 30% of its total mass These are approximations for our assignment, not exact constraints used in computational biology to identifv proteins) The DNA input data consists of line pairs. The first line has the name of the nucleotide sequence, and the second is the nucleotide sequence itself. Each character in a sequence of nucleotides will be A, C, G, T, or a dash character, "-". The nucleotides in the input can be either upper or lowercase. Input file dna.txt (partial): ure or canear protern AIGccACTATGGTAG captain picard hair growth protein ATgCCAACATGgATGCC-GATA GGATTgA bogus protein The dash --" characters represent "junk" or "garbage regions in the sequence. For most of the program they should be ignored in yor computations, though they do contribute to the total mass of the sequence as described later Program Behavior. Your program begins with an introduction and prompts for input and output file names. You may assume the user will type the name of an existing input file that is in the proper format. Your program reads the input file to process its nucleotide sequences and outputs the results into the given output file. Notice the nucleotide sequence is output in uppercase, and that the nucleotide counts and mass percentages are shown in A, C, G, T order. A given codon such as GAT might occur more than once in the same sequence Log of execution (user input underlined): program reports secruences that mav encodie prote2n5. Input file name ? dna. txt Output file name? output.txt Output file output.txt after above execution (partial) gion Name: cure Eor cancer protein Nucleotides: ATGCCACTATGGTAG Nuc. Counts: 4, 3.4. 41 Total Mass: [27.3, 16.8, 30.6, 25.31 of 1978.8 Codons List: CATG, CCA, CTA TGG, TAG] Is Protein? YES Region Name: captain picard hair growth protein Nucleotides: ATGCCAACATGCATGCCCGATATGGAITGA Nuc. Counts: [9. 6, 871 Total Mass 30.7, 16.8, 30.5, 22.1] of 3967.5 Codon L5t : [ATG, CCA, ACA, TGG, ATG, CCC, GAT, ATG, GAT, TGA] Is Protein? YES Region Name: bogus protein Nucleotidies: CCATT-2ATGATCA-CAGTT Nuc. Counts: [6, 4, 2. 61 Total Mass 32.3, 17.7. 12.1, 29.91 of 2508.1 Codons List: [CCA, TTA, ATG, ATC, aca, GIT Is Protein?: NO

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!