Question: unix Backgorund context: A DNA string is a sequence of the letters a, c, g, and t in any order, whose length is a multiple
unix
Backgorund context: A DNA string is a sequence of the letters a, c, g, and t in any order, whose length is a multiple of 3^1. For example, aacgtttgtaaccagaactgt is a DNA string of length 21. Each sequence of three consecutive letters is called a codon. For example, in the preceding string, the codons are aac, gtt, tgt, aac, cag, aac, and tgt.
Write a bash script that expects a file name on the command line. This file is supposed to be a dna file, which means that it contains only a DNA string with no newline characters or white space characters of any kind; it is a sequence of the letters a, c, g, and t of length 3n for some n. The script must count the number of occurrences of every codon in the file, assuming the first codon starts at position 1^2, and it must output the number of times each codon occurs in the file, sorted in order of decreasing frequency. For example, if dnafile is a file containing the dna string aacgtttgtaaccagaactgt, then the command codonhistogram dnafile should produce the following output: 3 aac
2 tgt
1 cag
1 gtt
- the script has to be able to come up with the codons for any DNA sequence
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
