Question: Write a function that finds and returns the most common sub-sequence in a larger sequence of DNA. DNA is composed of a string of 'a',

 Write a function that finds and returns the most common sub-sequencein a larger sequence of DNA. DNA is composed of a string

Write a function that finds and returns the most common sub-sequence in a larger sequence of DNA. DNA is composed of a string of 'a', 'g, 'c, and 't's, e.g atcaatgatcaacgtaagcttctaagcatgatcaaggtgctcacacagtttatccacaac ctgagtggatgacatcaagataggtcgttgtatctccttcctctcgtactctcatgacca cggaaagatgatcaagagaggatgatttcttggccatatcgcaatgaatacttgtgactt gtgcttccaattgacatcttcagcgccatattgcgctggccaaggtgacggagcgggatt acgaaagcatgatcatggctgttgttctgtttatcttgttttgactgagacttgttagga tagacggtttttcatcactgactagccaaagccttactctgcctgacatcgaccgtaaat tgataatgaatttacatgcttccgcgacgatttacctcttgatcatcgatccgattgaag atcttcaattgttaattctcttgcctcgactcatagccatgatgagctcttgatcatgtt tccttaaccctctattttttacggaagaatgatcaagctgctgctcttgatcatcgtttc This is a tool used, for example, when analyzing DNA for possible replication site origins. Your function, mostCommonSubstring(dna, mink, maxk), takes a string as an argument (the DNA sequence) and also takes the shortest, mink, and longest, maxk, acceptable result length. Your task then is to look at all substrings of length mink, mink+1,.., maxk-1, maxk and return the one that occurs with most frequency throughout the entire sequence. If there is a tie, it returns the longer string. If the tie is between substrings of the same length, the choice is arbitrary, and you can return any of the tied equal-length substrings. For example, mostCommonSubstringlgactctcagc, 2, 6) returns 'ctc' since it occurs twice and is longer than 'ct' and 'tc' which each also occur twice, and all other substrings of length 2, 3, 4,5 or 6 only occur once. Note that the occurrences can overlap. (Practice on this example before you try the whole huge file.) We'll implement this by writing another function, mostCommonK(dna, k), which looks for just the most common substring of length k, and returns it and it's frequency. So then mostCommonSubstring just calls mostCommonK repeatedly with each of mink, mink+1, etc. as arguments. The algorithm within mostCommonK is to take the first k letters of dna and see

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!