Question: All code must be written in C++ (C++ was also used in CS1, CS2, and CS3). 1. In this question, you will investigate the nucleotides

All code must be written in C++ (C++ was also used in CS1, CS2, and CS3).

1. In this question, you will investigate the nucleotides at the splicing sites (intersection of the exon and intron) within protein coding genes in human genome. You are given a fasta file called gene_fasta_chr12.fa which contain the sequences of randomly selected 2,412 protein coding genes from chromosome 12 in human. The sequence includes both the exon and intron portions of the gene. The nucleotides in exons are uppercased and the ones in the intron are lower case. Implement programs to compute the following [100 points]

Average number of exons in a gene

Average number of introns in a gene

Length of the longest and shortest intron

Length of the longest and shortest exon

Look at the positions immediately after each exon (donor site or the first two bases of each intron) in all the genes and count the frequency of all possible 2-mers at those locations. (GT is expected to have the highest frequency).

Look at the positions immediately before internal exons (splice acceptor sites or the last two bases of each intron) in all the genes and count the frequency of all possible 2-mers at those locations. (AG is expected to have the highest frequency).

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!