Question: use MATLAB to code 1. Implement a hashing function for 3-mer. That is, given a 3-mer, e.g., AAA, you should return an integer, e.g., 1.

use MATLAB to code

1. Implement a "hashing" function for 3-mer. That is, given a 3-mer, e.g., AAA, you should return an integer, e.g., 1. We take the natural order of sorted nuleotides, i.e., AAA -> 1, AAC -> 2, AAG -> 3, ..., TTT -> 64 as there are 64 possible combinations.

Hint:

Consider encode A, C, G, T as [0, 1, 2, 3]. Then, a 3-mer can be considered as a 3-digit 4 based integer (Quaternary). For example, ATC would be a 4 based number "031" which can be converted to a number in decimal numerical system by:

0 * 4^2 + 3 * 4^1 + 1 * 4^0 = 13

Since we require the number starts from 1, you need to add 1 to the final results. Hence, you should return 14 in the your function.

2. Now consider we want to know the 3-mer frequency for a DNA sequence, write a function that utilize the "hashing function" you did in problem 1 and return the 3-mer frequency in a DNA sequence. You should use a sliding window with step size of 1 to iterate through the sequence. For example, given input "ATTATTGC", you should return:

"ATT: 2, TTA: 1, TAT: 1, TTG: 1, TGC: 1".

Please submit your code in one single scripts. And show your output for the example inputs in comments.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!