Question: DO ONLY QUESTION 2. QUESTION 1 IS REFERENCE TO QUESTION 2. SHOW MATLAB OR PYTHON CODE. 1. Implement a hashing function for 3-mer. That is,
DO ONLY QUESTION 2. QUESTION 1 IS REFERENCE TO QUESTION 2.
SHOW MATLAB OR PYTHON CODE.
1. Implement a "hashing" function for 3-mer. That is, given a 3-mer, e.g., AAA, you should return an integer, e.g., 1. We take the natural order of sorted nucleotides, i.e., AAA -> 1, AAC -> 2, AAG -> 3, ..., TTT -> 64 as there are 64 possible combinations. Hint: Consider encode A, C, G, T as [0, 1, 2, 3]. Then, a 3-mer can be considered as a 3-digit 4 based integer (Quaternary). For example, ATC would be a 4 based number "031" which can be converted to a number in decimal numerical system by: 0 * 4^2 + 3 * 411 + 1 * 4^0 = 13 Since we require the number starts from 1, you need to add 1 to the final results. Hence, you should return 14 in your function. 2. Now consider we want to know the 3-mer frequency for a DNA sequence, write a function that utilize the "hashing function" you did in problem 1 and return the 3-mer frequency in a DNA sequence. You should use a sliding window with step size of 1 to iterate through the sequence. For example, given input "ATTATTGC", you should return; "ATT: 2, TTA: 1, TAT: 1, TTG: 1, TGC: 1
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
