Question: You may brainstorm and plan the coding portion together, even share bits of code, but you need to make sure you write up independent solutions.

 You may brainstorm and plan the coding portion together, even share

You may brainstorm and plan the coding portion together, even share bits of code, but you need to make sure you write up independent solutions. If your code is based strongly on someone else's code, please (1) give them credit and (2) make an effort to modify the code and take ownership (see instructions for Homework 1 for what this means). For the parts marked planning or discussion, should write your ideas independently, though you may and we will discuss this homework together In this assignment, you will be concerned with merging paired reads from an Illumina sequencing experiment 1. (planning) Mathematically derive and define the scoring function you will use in your alignment algo- rithm. Your scoring function will take in as arguments the two nucleotides and two quality scores that are proposed for alignment. It will return a numeric score. experiment are about 100-fold lower than substitution rates? estimate the true nucleotides in the overlap region of the reads 2. (planning) What gap penalty can you use to reflect that indel rates for the sequencer used in this 3. (planning) Assuming you have an alignment for the two reads, specify mathematically how you will 4. (planning) Mathematically derive and define the probability of an error for the nucleotides observed in the overlap region. Clarify how this probability will be converted to an ASCII quality score. What should the quality score of an inserted nucleotide (one that is aligned to the gap '-' in the other read) be? 5. (planning) Plan and write pseudocode for an algorithm to solve the problem 6. (coding) Implement the algorithm in Python using no additional libraries beyond those we have installed (biopython, scipy, in particular, may be useful). Your program should output (to stdout) the merged reads in fastq format, something like Gname_of_sequence score-my_score AAACC . . . your-merged-read-here . . . overlap_length-my_length 3>>3A.. .your_merged quality_scores here... Gname_of_next_sequence score-my_next_score overlap_length-my_next_length where my_score is the score of the alignment in this merged sequence and my length is the length of the alignment (excluding end gaps) in this merged sequence. If the aligned portion of the reads has length 0, do not output anything in the fastq file; these sequences cannot be merged and you will simply discard them 7. (discussion) Write one paragraph analyzing your algorithm results. Are you suspicious of any of your results? Why? How could you improve the algorithm? You may brainstorm and plan the coding portion together, even share bits of code, but you need to make sure you write up independent solutions. If your code is based strongly on someone else's code, please (1) give them credit and (2) make an effort to modify the code and take ownership (see instructions for Homework 1 for what this means). For the parts marked planning or discussion, should write your ideas independently, though you may and we will discuss this homework together In this assignment, you will be concerned with merging paired reads from an Illumina sequencing experiment 1. (planning) Mathematically derive and define the scoring function you will use in your alignment algo- rithm. Your scoring function will take in as arguments the two nucleotides and two quality scores that are proposed for alignment. It will return a numeric score. experiment are about 100-fold lower than substitution rates? estimate the true nucleotides in the overlap region of the reads 2. (planning) What gap penalty can you use to reflect that indel rates for the sequencer used in this 3. (planning) Assuming you have an alignment for the two reads, specify mathematically how you will 4. (planning) Mathematically derive and define the probability of an error for the nucleotides observed in the overlap region. Clarify how this probability will be converted to an ASCII quality score. What should the quality score of an inserted nucleotide (one that is aligned to the gap '-' in the other read) be? 5. (planning) Plan and write pseudocode for an algorithm to solve the problem 6. (coding) Implement the algorithm in Python using no additional libraries beyond those we have installed (biopython, scipy, in particular, may be useful). Your program should output (to stdout) the merged reads in fastq format, something like Gname_of_sequence score-my_score AAACC . . . your-merged-read-here . . . overlap_length-my_length 3>>3A.. .your_merged quality_scores here... Gname_of_next_sequence score-my_next_score overlap_length-my_next_length where my_score is the score of the alignment in this merged sequence and my length is the length of the alignment (excluding end gaps) in this merged sequence. If the aligned portion of the reads has length 0, do not output anything in the fastq file; these sequences cannot be merged and you will simply discard them 7. (discussion) Write one paragraph analyzing your algorithm results. Are you suspicious of any of your results? Why? How could you improve the algorithm

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!