Question: All code must be written in C++ (C++ was also used in CS1, CS2, and CS3). 1. In this question, you will investigate the nucleotides
All code must be written in C++ (C++ was also used in CS1, CS2, and CS3).
1. In this question, you will investigate the nucleotides at the splicing sites (intersection of the exon and intron) within protein coding genes in human genome. You are given a fasta file called gene_fasta_chr12.fa which contain the sequences of randomly selected 2,412 protein coding genes from chromosome 12 in human. The sequence includes both the exon and intron portions of the gene. The nucleotides in exons are uppercased and the ones in the intron are lower case. Implement programs to compute the following [100 points]
Average number of exons in a gene
Average number of introns in a gene
Length of the longest and shortest intron
Length of the longest and shortest exon
Look at the positions immediately after each exon (donor site or the first two bases of each intron) in all the genes and count the frequency of all possible 2-mers at those locations. (GT is expected to have the highest frequency).
Look at the positions immediately before internal exons (splice acceptor sites or the last two bases of each intron) in all the genes and count the frequency of all possible 2-mers at those locations. (AG is expected to have the highest frequency).
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
