Question: This question needs to be coded in java Basic Statistics Your program will first prompt the user to enter a single DNA sequence, which it

This question needs to be coded in java Basic Statistics Your program will first prompt the user to enter a single DNA sequence, which it should validate for legality (i.e., only the four valid bases) you might do this validation by writing a function that takes a String as a parameter and returns a boolean. Re-prompt the user if the input was invalid. Once you have a valid input, compute the following statistics (each should be implemented as a separate function, called from main()). Count the number of occurrences of C. Determine the fraction of cytosine and guanine nucleotides. For example, if half of the nucleotides in the sequence are either C or G, the fraction should be 0.5. A DNA strand is actually made up of pairs of bases in effect, two strands that are cross- linked together. These two strands are complementary: if you know one, you can always determine the other, or complement, because each nucleotide only pairs up with one other. In particular, A and T are complements, as are C and G. So, for example, the com- plement of the sequence AAGGTCT would be TTCCAGA. Compute the complement of the input sequence. Simple Pairwise Alignments During reproduction, DNA sequences from both parents are replicated and mixed to form the DNA of their offspring. This process is not 100% accurate, and errors, or mutations, creep into the genome. Sometimes, these mutations have no effect, sometimes they are immediately lethal and the offspring isnt viable, and sometimes they result in changes in characteristics that may make the offspring more competitive when it comes times for it to breed (or may make it more competitive if there is an environmental change). This mutation process is one element that underlies evolution. A result of evolution is that, after the fact, you can compare two nucleotide sequences and test the hypothesis that they share an evolutionary history. Such comparison allows us to learn how modifications to DNA result in modifications of biochemical processes and physical characteris- tics. This is why sequence alignment techniques are important. We determine an alignment by comparing two sequences and seeing how well they match. A very simple method for this com- parison is to look at corresponding nucleotides and compute a score for that potential alignment. If there are multiple potential alignments, then the one with the highest score would be considered most likely. 2 For example, lets say that the two input sequences are AATCTATA and AAGATA. There are three possible alignments: AATCTATA AAGATA AATCTATA AAGATA AATCTATA AAGATA In general, mutations can be a substitution of one nucleotide for another (for example, a G being replaced by a T), an insertion that adds one or more nucleotides, or a deletion that deletes one or more nucleotides. To keep things simple, we will concern ourselves only with the first of these three: point mutations. For simple, gap-free alignments, we compute a score using a simple rule: if the two corresponding characters match, we add a match score of one (1); if they dont match, the match score is zero (0). The total score for an alignment is the sum of the character scores, and the alignment with the highest score is the best match. So, for example, the scores for the three alignments above are 4, 1, and 3, and the best alignment is the first one. You will use this simple alignment method in your program. Program Description Your program will prompt, via the console, for the first sequence and compute its basic statistics, then prompt, and validate, user input of a second sequence. It will compute that second sequences basic statistics, too. Then, your program will compute the scores for all possible alignments for those two strings (you will want to have a method that takes two strings, plus an offset for shifting the shorter string relative to the longer one, and returns an int score). Thus, for the two input sequences AATCTATA and AAGATA, your program will output a report similar to the following: Sequence 1: AATCTATA C-count: 1 CG-rato: 0.125 Complement: TTAGATAT Sequence 2: AAGATA C-count: 0 CG-ratio: 0.167 Complement: TTCTAT Best alignment score: 4 AATCTATA AAGATA

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!