Question: A human chromosome is represented as a long string. Write a C program for counting the occurrences of all words of length 10 in the

A human chromosome is represented as a long string. Write a C program for counting the occurrences of all words of length 10 in the human Chromosome 1

(http://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/chr1.fa.gz you will need to decompress this file). A word in this context is a substring starting at any nucleotide and has a length of 10. Each nucleotide represents the beginning of a word; two consecutive words overlap in 9 nucleotides, and are NOT separated by spaces.

The chromosome file must be provided to your program as a command-line argument. Sequences of human chromosomes may contain additional letters when the identity of the nucleotide cannot be determined precisely. Words consisting of A, C, G, and T only must be counted; all other words must not.

The counting is NOT case-sensitive.

The output of your program is as follows:

word 1 location 1, location 2, .. , location n

word 2 location 1, location 2, .. , location m

Note: word 1, word 2, and every reported word are unique.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!