Question: 2, Biology, Bioinformatics, DNA-----------------Please write Program in perl. Introduction Evolutionary Distance between two nucleic acid sequences. DNA has four nucleotides, and proteins have 20 amino
2, Biology, Bioinformatics, DNA-----------------Please write Program in perl.
Introduction
Evolutionary Distance between two nucleic acid sequences.
DNA has four nucleotides, and proteins have 20 amino acids. Both DNA and proteins are essentially polymers made from their building blocks attached end to end.
The letters A,C,G,T represent the nucleic acids for DNA.
For every two sequences, the distance is a single value estimated from the dissimilarity, i.e. the fraction of positions in which both sequences differ. One of the first substitution models used in the estimation of evolutionary distances is the one of Jukes and Cantor.
This model starts from the assumptions that all substitutions are independent, that all sequence positions are equally subject to change, that substitutions occur randomly among the four types of nucleotides, and that no insertions or deletions have occurred.
We can derive the equation that yields an estimate of the true number of substitutions that have
occurred between two sequences when only a pairwise counting of differences is available:
dAB=-3/4 ln[1 (4/3) (fAB)]
where: fAB is the dissimilarity (fraction of observed differences) between sequences A and B. It , is the fraction of nucleotides that simple count reveals to be different two sequences.
and dAB is the estimated evolutionary distance (fraction of expected substitutions) between sequences A and B.
What is asking for:
The purpose is to write a Perl program that implements the Jukes solution finding the distance (dAB ) between two biological sequences.
Create also the frequency table. The frequency table will be a 2 dim table that has the A,G,C,T as axes and count the changes from any nucleotide to all the others.
Implementation
The program will read two sequences from two files:
S0: GCCGTCAGAAATTTAGCACTGATCACAGCCTCGTCTCTGA
S1: GCCCTCAGGGAATTAGCACTAATCATAACTCCGTCTGTGT
The Jukes distance will by calculated by a subroutine.
The rate of substitutions (a) ,over one time step, will also be computed by the formula:
a = #of undergone mutations / size of sequence.
It is considered that the two sequences have the same size.
For the frequency table the output will be the numbers of changes from any pair of S0 to the corresponding S1 nucleotide.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
