Question: DNA Profiling Learner Objectives At the conclusion of this programming assignment, participants should be able to: Write a C++ program that accepts a CSV file

DNA Profiling

Learner Objectives

At the conclusion of this programming assignment, participants should be able to:

  • Write a C++ program that accepts a CSV file representing a DNA database and a text file representing a DNA sequence.
  • Use a combination of loops and string manipulation, and file I/O to identify whom the DNA sequence belongs.

The data in the above file would suggest that Harry has the sequence AGAT repeated 2 times consecutively somewhere in his DNA, the sequence AATG repeated 8 times, and TCTAG repeated 3 times. Ron, meanwhile, has those same three STRs repeated 4 times, 1 times, and 5 times, respectively. And Hermione has those same three STRs repeated 3, 2, and 5 times, respectively.

So given a sequence of DNA, how might you identify to whom it belongs? Well, imagine that you looked through the DNA sequence for the longest consecutive sequence of repeated AGAT and found that the longest sequence was 4 repeats long. If you then found that the longest sequence of AATG is 1 repeat long, and the longest sequence of TCTAG is 5 repeats long, that would provide pretty good evidence that the DNA was Ron's. Of course, it's also possible that once you take the counts for each of the STRs, it doesn't match anyone in your DNA database, in which case you have no match. st In practice, since analysts know on which chromosome and at which location in the DNA an STR will be found, they can localize their search to just a narrow section of DNA. But we'll ignore that detail for this problem.

Implementation Requirements

  • [ ] The program should require
    • [ ] as its first command-line argument the name of a CSV file containing the STR counts for a list of individuals and
    • [ ] should require as its second command-line argument the name of a text file containing the DNA sequence to identify
  • [ ] Your program should open the CSV file and read its contents. For example, below is the contents of database/small.csv
  • name,AGATC,AATG,TATC Alice,2,8,3 Bob,4,1,5 Charlie,3,2,5
  • [ ] The first row of the CSV file will be the column names. The first column will be the word name and the remaining columns will be the STR sequences themselves. You will read these STR sequences and store them in a vector.
  • vector strSequence;
  • [ ] Then read the rest of the contents into a vector of struct Data
  • struct Data { string name; // person's name vector strCounters; // count for each STR };
  • [ ] Your program should open the DNA sequence and read its contents into a string.
  • [ ] For each of the STRs (from the first line of the CSV file), your program should compute the longest run of consecutive repeats of the STR in the DNA sequence to identify.
  • [ ] If the STR counts match exactly with any of the individuals in the CSV file, your program should print out the name of the matching individual.
    • [ ] You may assume that the STR counts will not match more than one individual.
    • [ ] If the STR counts do not match exactly with any of the individuals in the CSV file, your program should print "No match"
  • Additional Requirements

  • [ ] The executable program should be called profile
  • [ ] Practice modular programming by breaking down your program into functions. If you want to use classes, you may do so, but try to analyze how this should be represented as a class.
  • [ ] Use the three file structure
  • [ ] Add a file header comment for each file
  • [ ] Add a function header comment for each function
  • [ ] Add in-line comments in your code
  • [ ] Commit your code frequently

Usage

Your program should behave per the example below:

 

$ ./profile database/small.csv sequences/01.txt Bob

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!