Question: Write ( efficient without using hashtable, map etc.. ) code in C++ that does the following: find all the kmers in a given string print

Write (efficient without using hashtable, map etc..) code in C++ that does the following:

find all the kmers in a given string print the kmers with the number of time appers. (The code should work with any string with any size and any value of k. Assume that string only cotains the letter ,a,c,g,t).

suppose the length of the kmer is k=3, and the string given is ACTACT. The kmers are: ACT,CTA,TAC,ACT.

The output should be ACT 2, CTA 1, TAC 1

I am able to print out all the kmers but im not sure how to count them, and only print out the unique one's with the number of times repeated. The hint my teach gave is the following:

Write (efficient without using hashtable, map etc..) code in C++ that does

To implement your k-mer counter consider the following observation. Let A-0, C-1, G-2 and T-3 (i.e. think about DNA letters as digits). We can represent each k-mer as a number in base-4 system, which next can be converted to a regular base-10 index. For example, a 3-mer CGA can be represented as 1204 which is 24 in the decimal system (i.e. 2410. We can use this simple mechanism to assign index to each k-mer and use array to store counts of different k-mers. What should be the size of such count array? Notice that for a given k there are 4k possible correct k-mers. As long as k is small (and this is the case for this assignment) we can easily store count of all k-mers in the main memory. To implement your k-mer counter consider the following observation. Let A-0, C-1, G-2 and T-3 (i.e. think about DNA letters as digits). We can represent each k-mer as a number in base-4 system, which next can be converted to a regular base-10 index. For example, a 3-mer CGA can be represented as 1204 which is 24 in the decimal system (i.e. 2410. We can use this simple mechanism to assign index to each k-mer and use array to store counts of different k-mers. What should be the size of such count array? Notice that for a given k there are 4k possible correct k-mers. As long as k is small (and this is the case for this assignment) we can easily store count of all k-mers in the main memory

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!