Question: I need help with this project in java DNA Background DNA, the carrier of genetic information in living things, has been used in criminal justice

I need help with this project in java

DNA Background DNA, the carrier of genetic information in living things, has been used in criminal justice for decades. But how, exactly, does DNA profiling work? Given a sequence of DNA, how can forensic investigators identify to whom it belongs? Well, DNA is really just a sequence of molecules called nucleotides, arranged into a particular shape (a double helix). Each nucleotide of DNA contains one of four different bases: adenine (A), cytosine (C), guanine (G), or thymine (T). Every human cell has billions of these nucleotides arranged in sequence. Some portions of this sequence (i.e., genome) are the same, or at least very similar, across almost all humans, but other portions of the sequence have a higher genetic diversity and thus vary more across the population. One place where DNA tends to have high genetic diversity is in Short Tandem Repeats (STRs). An STR is a short sequence of DNA bases that tends to be repeated back-to-back numerous times at specific locations in DNA. The number of times any particular STR repeats varies a lot among different people. In the DNA samples below, for example, Alice has the STR AGAT repeated four times in her DNA, while Bob has the same STR repeated five times. Using multiple STRs, rather than just one, can improve the accuracy of DNA profiling. If the probability that two people have the same number of repeats for a single STR is 5%, and the analyst looks at 10 different STRs, then the probability that two DNA samples match purely by chance is about 1 in 1 quadrillion (assuming all STRs are independent of each other). So, if two DNA samples match in the number of repeats for each of the STRs, the analyst can be pretty confident they came from the same person. CODIS, The FBI's DNA database, uses 20 different STRs as part of its DNA profiling process. What might such a DNA database look like? Well, in its simplest form, you could imagine formatting a DNA database as a TSV1 file, wherein each row corresponds to an individual, and each column corresponds to a particular STR. 1 Tab-separated values (TSV) is a simple, text-based file format for storing tabular data. Records are separated by newlines, and values within a record are separated by tab characters.

Name AGAT AATG TATC Alice 28 42 14 Bob 17 22 19 Charlie 36 18 25 The data in the above file would suggest that Alice has the sequence AGAT repeated 28 times consecutively somewhere in her DNA, the sequence AATG repeated 42 times, and TATC repeated 14 times. Bob, meanwhile, has those same three STRs repeated 17 times, 22 times, and 19 times, respectively. And Charlie has those same three STRs repeated 36, 18, and 25 times, respectively. So, given a sequence of DNA, how might you identify to whom it belongs? Well, imagine that you looked through the DNA sequence for the longest consecutive sequence of repeated AGATs and found that the longest sequence was 17 repeats long. If you then found that the longest sequence of AATGs is 22 repeats long, and the longest sequence of TATC is 19 repeats long, that would provide pretty good evidence that the DNA was Bob's. Of course, it's also possible that once you take the counts for each of the STRs, it doesn't match anyone in your DNA database, in which case you have no match. In practice, since analysts know on which chromosome and at which location in the DNA an STR will be found, they can localize their search to just a narrow section of DNA. But we'll ignore that detail for this problem. Your task is to write a Java program that will take a sequence of DNA and a TSV file containing STR counts for a list of individuals and then output to whom the DNA (most likely) belongs. Specification In DNAProfile.java, implement a program that identifies to whom a sequence of DNA belongs. Your program should open the TSV file (dnaData.txt) and read its contents into memory. o You may assume that the first row of the TSV file will be the column names. The first column will be the word name and the remaining columns will be the STR sequences themselves. Your program should open a DNA sequence file (some sample files are provided for your use for testing your program) and read its contents into memory. For each of the STRs (from the first line of the TSV file), your program should compute the longest run of consecutive repeats of the STR in the DNA sequence to identify. If the STR counts match exactly with any of the individuals in the TSV file, your program should print out the name of the matching individual. o You may assume that the STR counts will not match more than one individual. o If the STR counts do not match exactly with any of the individuals in the TSV file, your program should print "No match".

Hints You may find the following String methods (from Java API) useful, and for more on these and other useful methods, refer to Java String API o indexOf o length o substring o equals Use procedural decomposition to structure your program using methods. As always, comment your code and include a title comment. Testing For the TSV file: Name AGAT AATG TATC Alice 5 2 8 Bob 3 7 4 Charlie 6 1 5 The following sequence should match with Alice: AGACGGGTTACCATGACTATCTATCTATCTATCTATCTATCTATCTATCACGTACGTACGTATCGAGATAGATAGAT AGATAGATCCTCGACTTCGATCGCAATGAATGCCAATAGACAAAA The following sequence should match with Bob: AACCCTGCGCGCGCGCGATCTATCTATCTATCTATCCAGCATTAGCTAGCATCAAGATAGATAGATGAATTTCGAAA TGAATGAATGAATGAATGAATGAATG The following sequence should match with Charlie: CCAGATAGATAGATAGATAGATAGATGTCACAGGGATGCTGAGGGCTGCTTCGTACGTACTCCTGATTTCGGGGATC GCTGACACTAATGCGTGCGAGCGGATCGATCTCTATCTATCTATCTATCTATCCTATAGCATAGACATCCAGATAGA TAGATC And the following sequence should have No match: GGTACAGATGCAAAGATAGATAGATGTCGTCGAGCAATCGTTTCGATAATGAATGAATGAATGAATGAATGAATGAC ACACGTCGATGCTAGCGGCGGATCGTATATCTATCTATCTATCTATCAACCCCTAG

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

In java, please DNA Background DNA, the carrier of genetic information in living things, has been used in criminal justice for decades. But how, exactly, does DNA profiling work? Given a sequence of...

Problem to Solve DNA, the carrier of genetic information in living things, has been used in criminal justice for decades. But how, exactly, does DNA profiling work? Given a sequence of DNA, how can...

NO PLAGIARISM, NO PLAGIARISM, NO PLAGIARISM!!!! PYTHON AND USE CS50 IDE https://cs50.harvard.edu/x/2021/psets/6/dna/ DNA Implement a program that identifies a person based on their DNA, per the...

Implement a program that identifies a person based on their DNA. $ python3 dna.py databases/large.csv sequences/5.txt Lavender Download the file that you're going to use for this problem. Extract...

in python pls 12.1 DNA Sequence DNA, the carrier of genetic information in living things, has been used in criminal justice for decades. But how, exactly, does DNA profilin work? Given a sequence of...

I need help on this homework assignment. C++ language is preferred Background DNA, or deoxyribonucleic acid, is the primary carrier of genetic information in most organisms. The information in DNA is...

I need help revising my PowerPoint. Here is what my professor said so how should I fix it what should I do? Your enthusiasm for this topic is quite clear in the information density of your...

The pictures are transcripts from the podcast. Please help me with the short reaction for these podcast. After reading the chapters, listen to the Deep Dive Podcasts for this module. Take a moment to...

Students will find the York City Case Study provided for review. There is a maximum of three pages in this work. All ethical decisions affect others (by definition) and, as Aristotle points out,...

Periodic inventory using FIFO, LIFO, and weighted average cost methods The units of an Item available for sale during the year were as follows: 16 units at $26 $416 Jan. 1 Aug. 13 7 units at $27 189...

49. When developing a forecast and production plan; utilization refers to: a) The minimum number of labour hours required to produce full capacity b) The total number of units in the production plan...

Drill Problem 11-5 (Algo) [LU 11-2 (1, 2)] Solve for maturity value, discount period, bank discount, and proceeds. Assume a bank discount rate of 11%. Use the ordinary interest method. (Use Days in a...

Compared with half a century ago, adoption has become _ _ _ _ _ _ _ _ _ common, but it is more open and acceptabl e , so we probably discuss it _ _ _ _ _ _ _ . fill in the blanks more or much less or...

If the tax rate is 40 percent, compute the beforetax real interest rate and the after-tax real interest rate in each of the following cases. a. The nominal interest rate is 10 percent and the...

Assume that the reserve requirement is 20%. Also assume that banks do not hold excess reserves and there is no cash held by the public. The Federal Reserve decides that it wants to expand the money...

It is often suggested that the Federal Reserve try to achieve zero inflation. If we assume that velocity is constant, does this zero-inflation goal require that the rate of money growth equal zero?...