Question: DNA Profiling Learner Objectives At the conclusion of this programming assignment, participants should be able to: Write a C++ program that accepts a CSV file
DNA Profiling
Learner Objectives
At the conclusion of this programming assignment, participants should be able to:
- Write a C++ program that accepts a CSV file representing a DNA database and a text file representing a DNA sequence.
- Use a combination of loops and string manipulation, and file I/O to identify whom the DNA sequence belongs.
The data in the above file would suggest that Harry has the sequence AGAT repeated 2 times consecutively somewhere in his DNA, the sequence AATG repeated 8 times, and TCTAG repeated 3 times. Ron, meanwhile, has those same three STRs repeated 4 times, 1 times, and 5 times, respectively. And Hermione has those same three STRs repeated 3, 2, and 5 times, respectively.
So given a sequence of DNA, how might you identify to whom it belongs? Well, imagine that you looked through the DNA sequence for the longest consecutive sequence of repeated AGAT and found that the longest sequence was 4 repeats long. If you then found that the longest sequence of AATG is 1 repeat long, and the longest sequence of TCTAG is 5 repeats long, that would provide pretty good evidence that the DNA was Ron's. Of course, it's also possible that once you take the counts for each of the STRs, it doesn't match anyone in your DNA database, in which case you have no match. st In practice, since analysts know on which chromosome and at which location in the DNA an STR will be found, they can localize their search to just a narrow section of DNA. But we'll ignore that detail for this problem.
Implementation Requirements
- [ ] The program should require
- [ ] as its first command-line argument the name of a CSV file containing the STR counts for a list of individuals and
- [ ] should require as its second command-line argument the name of a text file containing the DNA sequence to identify
- [ ] Your program should open the CSV file and read its contents. For example, below is the contents of database/small.csv
- name,AGATC,AATG,TATC Alice,2,8,3 Bob,4,1,5 Charlie,3,2,5
- [ ] The first row of the CSV file will be the column names. The first column will be the word name and the remaining columns will be the STR sequences themselves. You will read these STR sequences and store them in a vector.
- vector
strSequence; - [ ] Then read the rest of the contents into a vector of struct Data
- struct Data { string name; // person's name vector
strCounters; // count for each STR }; - [ ] Your program should open the DNA sequence and read its contents into a string.
- [ ] For each of the STRs (from the first line of the CSV file), your program should compute the longest run of consecutive repeats of the STR in the DNA sequence to identify.
- [ ] If the STR counts match exactly with any of the individuals in the CSV file, your program should print out the name of the matching individual.
- [ ] You may assume that the STR counts will not match more than one individual.
- [ ] If the STR counts do not match exactly with any of the individuals in the CSV file, your program should print "No match"
-
Additional Requirements
- [ ] The executable program should be called profile
- [ ] Practice modular programming by breaking down your program into functions. If you want to use classes, you may do so, but try to analyze how this should be represented as a class.
- [ ] Use the three file structure
- [ ] Add a file header comment for each file
- [ ] Add a function header comment for each function
- [ ] Add in-line comments in your code
- [ ] Commit your code frequently
Usage
Your program should behave per the example below:
$ ./profile database/small.csv sequences/01.txt Bob
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
