Question: unix Backgorund context: A DNA string is a sequence of the letters a, c, g, and t in any order, whose length is a multiple

unix

Backgorund context: A DNA string is a sequence of the letters a, c, g, and t in any order, whose length is a multiple of 3^1. For example, aacgtttgtaaccagaactgt is a DNA string of length 21. Each sequence of three consecutive letters is called a codon. For example, in the preceding string, the codons are aac, gtt, tgt, aac, cag, aac, and tgt.

Write a bash script that expects a file name on the command line. This file is supposed to be a dna file, which means that it contains only a DNA string with no newline characters or white space characters of any kind; it is a sequence of the letters a, c, g, and t of length 3n for some n. The script must count the number of occurrences of every codon in the file, assuming the first codon starts at position 1^2, and it must output the number of times each codon occurs in the file, sorted in order of decreasing frequency. For example, if dnafile is a file containing the dna string aacgtttgtaaccagaactgt, then the command codonhistogram dnafile should produce the following output: 3 aac

2 tgt

1 cag

1 gtt

- the script has to be able to come up with the codons for any DNA sequence

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!