Question: Write a reference class called DNASequence. This will contain the name of a species (as a string) and a sequence of DNA nucleotides, also called

Write a reference class called DNASequence. This will contain the name of a species (as a string) and a sequence of DNA nucleotides, also called bases. There are four kinds of bases: adenine (A), cytosine (C), guanine (G), and thymine (T). The DNA sequence will be represented by a string containing the four capital letters that stand for these bases (A, C, G, T). DNA, at least in part, provides a code for protein structure. Proteins are built from amino acids. There are four bases and twenty amino acids, along with a special value called a stop code. Every possible value for three consecutive bases, which is called a codon, corresponds to an amino acid. Here is a table of that correspondence: Codon Amino Acid One letter code AAA K AAC N AAG K AAT N ACA T ACC T ACG T ACT T AGA R AGC S AGG R AGT S ATA I ATC I ATG M ATT I CAA Q CAC H CAG Q CAT H CCA P CCC P CCG P CCT P CGA R CGC R CGG R CGT R CTA L CTC L CTG L CTT L GAA E GAC D GAG E GAT D GCA A GCC A GCG A GCT A GGA G GGC G GGG G GGT G GTA V GTC V GTG V GTT V TAA X TAC Y TAG X TAT Y TCA S TCC S TCG S TCT S TGA X TGC C TGG W TGT C TTA L TTC F TTG L TTT F The single letters for the amino acids correspond to the full and short names of the amino acids according to this table: Amino acid One letter code Full name Short name A alanine ala B asparagine or aspartic acid asx C cysteine cys D aspartic acid asp E glutamic acid glu F phenylalanine phe G glycine gly H histidine his I isoleucine ile K lysine lys L leucine leu M methionine met N asparagine asn P proline pro Q glutamine gln R arginine arg S serine ser T threonine thr V valine val W tryptophan trp X stop stp Y tyrosine tyr Z glutamine or glutamic acid glx Please implement the following API for the class: public DNASequence(String species, String sequence): Create a DNASequence object with the given species and sequence. public int countA(): Return the number of adenine bases in the sequence. public int countC(): Return the number of cytosine bases in the sequence. public int countG(): Return the number of guanine bases in the sequence. public int countT(): Return the number of thymine bases in the sequence. public int size(): Return the number of bases in the sequence. public Iterable aminoAcidSequenceFullName(int start, int end): Return an iterable object (e.g., an array or ArrayList) containing a sequence of amino acids. As explained above, three bases together (called a codon) map to one of the twenty amino acids. This method maps each of the codons starting at position start and ending at end to the full name of its corresponding amino acid. public Iterable aminoAcidSequenceShortName(int start, int end): Does the same as the previous method but returns an iterable object containing the amino acid short names. public Iterable subsequencePositions(String subsequence): Returns an iterable object containing the positions where the string in subsequence occur in the sequence. To make the two methods returning amino acid sequences a bit easier, please use the class AminoAcid. This provides a way of looking up an amino acid's long name or short name using the single-letter abbreviation as a key. For example, the method call AminoAcid.fullName("S") will return the value "serine" and the method call AminoAcid.shortName("S") will return "ser". Of course, this means that your code will have to translate a codon to the single-letter amino acid code first. I suggest defining a symbol table as a static variable in the class and filling with key/value pairs containing a codon and its corresponding single-letter code for an amino acid. The application Write an application program called SequenceReport that reads in a text file where each line contains a species name, a tab character ("\t"), and a DNA sequence. For each of these, print: The name of the species The number of bases in the sequence The number of codons in the sequence The percentage of the occurrence of each base The positions in the sequence where the subsequence "CCAAT" occurs The first twelve amino acids, by their short names, that are coded beginning at position 0. Same as the previous but by the amino acid long names. The number of occurrences of the amino acid phenylalanine. The number of occurrences of the amino acid histidine.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!