Question: You may brainstorm and plan the coding portion together, even share bits of code, but you need to make sure you write up independent solutions.

You may brainstorm and plan the coding portion together, even share

You may brainstorm and plan the coding portion together, even share bits of code, but you need to make sure you write up independent solutions. If your code is based strongly on someone else's code, please (1) give them credit and (2) make an effort to modify the code and take ownership (see instructions for Homework 1 for what this means). For the parts marked planning or discussion, should write your ideas independently, though you may and we will discuss this homework together In this assignment, you will be concerned with merging paired reads from an Illumina sequencing experiment 1. (planning) Mathematically derive and define the scoring function you will use in your alignment algo- rithm. Your scoring function will take in as arguments the two nucleotides and two quality scores that are proposed for alignment. It will return a numeric score. experiment are about 100-fold lower than substitution rates? estimate the true nucleotides in the overlap region of the reads 2. (planning) What gap penalty can you use to reflect that indel rates for the sequencer used in this 3. (planning) Assuming you have an alignment for the two reads, specify mathematically how you will 4. (planning) Mathematically derive and define the probability of an error for the nucleotides observed in the overlap region. Clarify how this probability will be converted to an ASCII quality score. What should the quality score of an inserted nucleotide (one that is aligned to the gap '-' in the other read) be? 5. (planning) Plan and write pseudocode for an algorithm to solve the problem 6. (coding) Implement the algorithm in Python using no additional libraries beyond those we have installed (biopython, scipy, in particular, may be useful). Your program should output (to stdout) the merged reads in fastq format, something like Gname_of_sequence score-my_score AAACC . . . your-merged-read-here . . . overlap_length-my_length 3>>3A.. .your_merged quality_scores here... Gname_of_next_sequence score-my_next_score overlap_length-my_next_length where my_score is the score of the alignment in this merged sequence and my length is the length of the alignment (excluding end gaps) in this merged sequence. If the aligned portion of the reads has length 0, do not output anything in the fastq file; these sequences cannot be merged and you will simply discard them 7. (discussion) Write one paragraph analyzing your algorithm results. Are you suspicious of any of your results? Why? How could you improve the algorithm? You may brainstorm and plan the coding portion together, even share bits of code, but you need to make sure you write up independent solutions. If your code is based strongly on someone else's code, please (1) give them credit and (2) make an effort to modify the code and take ownership (see instructions for Homework 1 for what this means). For the parts marked planning or discussion, should write your ideas independently, though you may and we will discuss this homework together In this assignment, you will be concerned with merging paired reads from an Illumina sequencing experiment 1. (planning) Mathematically derive and define the scoring function you will use in your alignment algo- rithm. Your scoring function will take in as arguments the two nucleotides and two quality scores that are proposed for alignment. It will return a numeric score. experiment are about 100-fold lower than substitution rates? estimate the true nucleotides in the overlap region of the reads 2. (planning) What gap penalty can you use to reflect that indel rates for the sequencer used in this 3. (planning) Assuming you have an alignment for the two reads, specify mathematically how you will 4. (planning) Mathematically derive and define the probability of an error for the nucleotides observed in the overlap region. Clarify how this probability will be converted to an ASCII quality score. What should the quality score of an inserted nucleotide (one that is aligned to the gap '-' in the other read) be? 5. (planning) Plan and write pseudocode for an algorithm to solve the problem 6. (coding) Implement the algorithm in Python using no additional libraries beyond those we have installed (biopython, scipy, in particular, may be useful). Your program should output (to stdout) the merged reads in fastq format, something like Gname_of_sequence score-my_score AAACC . . . your-merged-read-here . . . overlap_length-my_length 3>>3A.. .your_merged quality_scores here... Gname_of_next_sequence score-my_next_score overlap_length-my_next_length where my_score is the score of the alignment in this merged sequence and my length is the length of the alignment (excluding end gaps) in this merged sequence. If the aligned portion of the reads has length 0, do not output anything in the fastq file; these sequences cannot be merged and you will simply discard them 7. (discussion) Write one paragraph analyzing your algorithm results. Are you suspicious of any of your results? Why? How could you improve the algorithm

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Case Summary Read the Discussion Assignment 1-1 on p.24 of the text Winning and Longevity. Select a health care entity to focus on, this could be a clinic or hospital of your choosing. Apply the case...

A discrete sequence {xn} can be converted into a continuous representation x(t) = ts X n= (t n ts) xn, where ts is the sampling period. (a) State two characteristic properties of Dirac's function. [2...

Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...

Instructions to the Candidates This assessment is to be completed according to the instructions given below in this document. Should you not answer the tasks correctly, you will be given feedback on...

think about what procedural changes would have the biggest positive impact, without being excessively costly for our lab members at every level (including undergrads!). Reference: the Lab Data Check...

What Google Learned From Its Quest to Build the Perfect Team (Adapted from the New York Times 28 /02/2016 by Dr. M Heffernan, O.A.M.) JULIA'S BACKGROUND By the time Julia Rozovsky was 25 years old...

pls help solve it all Pre-Assessment Checklist: Task 3 - Project The purpose of this checklist The pre-assessment checklist helps students determine if they are ready for assessment. The...

There are two problems due this week (each worth 35 points) as follows. Problem 1.6 (page 20) In comprehensive paragraphs, answerrequirements a to e. You will have 5 paragraphs total of four to five...

MATHEMATICIANS RISE TO A CHALLENGE ne of the theorems we teach in eighth grade is a + b= *, where c is the length of the hypotenuse of a right triangle in Euclidean space, and a and b are the lengths...

Using the Annual Report of your selected company (WALMART), answer the following questions in the Discussion: What is the value of the company's inventory at year end? What was the amount of cost of...

A $1000 zero-coupon bond (no cash payments prior to maturity) will pay out its face value of $1000 in five years. In order for the bond to match the value of U.S. Savings Bonds paying 4% interest...

Why are markers such as RFLPs, SNPs, and microsatellites often used in QTL mapping? Select the three correct answers. They can modify native DNA sequences. The are relatively easy to access. They are...

estion 1 5 e ' s Health Supply has 1 2 5 , 0 0 0 shares of stock outstanding with a par value of $ 1 per share and a market value of $ 5 a share. company has retained earnings of $ 7 6 , 5 0 0 and...

Seved Help 14 Wisconsin Snowmobile Corp. is considering a switch to level production Cost efficiencies would occur under level production, and aftertax costs would decline by $31,500, but inventory...

How do Dimensional Database Models differ from Relational Models?

What type of processing do Relational Databases support?

Describe several aggregation operators.