Question: Code in c++ language. Write code to read, store, and analyze the latest human genome assembly (found at: /common/contrib/classroom/inf503/genomes/human.txt ). At minimum, your code must
Code in c++ language.
Write code to read, store, and analyze the latest human genome assembly (found at: /common/contrib/classroom/inf503/genomes/human.txt ). At minimum, your code must contain (10pts):
A character array to store the entire human genome in a single data structure
A separate function to read the human genome file
A function to compute the number of A, C, G, or T characters in the human genome
Comments describing major code blocks and control structures
(20pts) Read in and store the human genome. There will be multiple scaffolds (each with a separate header denoted by >). Concatenate the entire genome (discard headers) into a single character array data structure. Collect the following statistics (see below) as you are reading the file. Hint: you can keep running totals or store scaffold sizes / names in a separate sets of arrays
How many scaffolds were there?
What was the longest and shortest scaffold? Provide names of scaffolds and lengths.
What was the average scaffold length?
(20pts) Write a function to assess the content of the human genome count the total number of a given character (A, C, G, or T) in the whole genome.
What is the big O notation of your search (linear / quadratic / cubic / etc)?
How long does it take (in seconds) to execute this function? Hint: You will need to use
system time within your code to get accurate time estimates.
What was the GC content of the human genome (percent of Cs and Gs in the genome)?
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
