Question: BME 201 HOMEWORK 5 - How to fix the code for DNA Replication, Transcription, and translation. InsulinDNAseq is included. Please help! 0 10f7 BME 201
BME 201 HOMEWORK 5 - How to fix the code for DNA Replication, Transcription, and translation. InsulinDNAseq is included.
Please help!
0 10f7 BME 201 Homework 5 - Molecular Biology: Transcription and Translation Updated by M. Pacey on 02Feb2024 Name: Date: clear clc DNA Replication This section of code models DNA replication. We'll start with a single strand of DNA to use as our template. By the end, we want a sequence for the strand that's synthesized from this template. Refer to the diagram for help visualizing everything. Import the template strand sequence, which comes from the human gene for insulin. Note that this sequence is written in the 5'->3' direction, as is standard. Take a second to look at the sequence after it's imported. seq_file = fopen('insulinDNAseq.txt'); template5_3 = fscanf(seq_file,'%s'); seq_len = length(template5_3); % length of imported sequence [bpl % Now we'll model DNA replication, using this sequence as a template. Question: What cell machinery is responsible for DNA replication? Answer: DNA Polymerase Look at the diagram and notice that when DNA is being replicated, the newly-synthesized strand is synthesized 5'->3', BUT the template strand is actually read 3'->5'. We'll mimic this property in our code. First define a variable template3_5 that gives the sequence of the template strand in the 3'->5' direction. Hint: look into the MATLAB command 'flip' for help with this. Look at template5_3 and template3_5 in the command window and make sure the result makes sense. template3_5 = ; Now we'll write code that mimics the function of the DNA replication machinery. Below is a matrix that gives the base-pairing rules followed for DNA replication. The letter in the first column gives the base of template strand; the letter in the second column gives the corresponding base of the newly-synthesized strand. (Don't worry too much about the curly braces - it just means we're defining a cell array.) Question: Now initialize a variable that will be used to store the sequence of the newly-synthesized strand (synth). Since we're synthesizing this strand in the same order/direction as the DNA replication machinery, should we name this variable synth3_5 or synth5_3? Explain your choice. Answer: [1; % The for loop below reads each base on the template strand and chooses the % appropriate base to be added to the newly-synthesized strand, but it's % incomplete. Fill in the gaps to complete the loop. for i = 1:seq_len template_base = template3_5(i); % pull the ith base of the template strand row_A = strcmp(template_base,A(:,1)); % identify the row in matrix A corresponding to templateBase; note that the variable row_A is a boolean vector % Define the base added to the synthesized strand using matrix 'A' and % vector 'row_ A' synth_base = ; _ base-pairing rules followed for DNA replication. The letter in the first column gives the base of template strand; the letter in the second column gives the corresponding base of the newly-synthesized strand. (Don't worry too much about the curly braces - it just means we're defining a cell array.) 0 20f 7 Question: Now initialize a variable that will be used to store the sequence of the newly-synthesized strand (synth). Since we're synthesizing this strand in the same order/direction as the DNA replication machinery, should we name this variable synth3_5 or synth5_3? Explain your choice. Answer: 1 % The for loop below reads each base on the template strand and chooses the % appropriate base to be added to the newly-synthesized strand, but it's % incomplete. Fill in the gaps to complete the loop. for i = 1:seq_len template_base = template3_5(i); % pull the ith base of the template strand row_A = strcmp(template_base,A(:,1)); % identify the row in matrix A corresponding to templateBase; note that the variable row_A is a boolean vector % Define the base added to the synthesized strand using matrix 'A' and % vector 'row_ A' synth_base = ; % Store synth_base in the ith element of your storage vector. You'll % need to use the command 'char' to store the character properly. 5 end Unrecognized function or variable 'template_base'. The values saved in your storage vector are likely ASCII values. To convert them into a character string, again use the MATLAB command 'char.' Example: x = char(x), where x is the name of your storage variable. 1 P P P P Question: Look at the result of the storage variable in your command window. Is this written in the 3'->5' or 5'->3' direction? Answer: Question: Should we use the 'flip' command to get it in the standard 5'->3' direction? Answer: Transcription This section of code models transcription, the process of synthesizing RNA from a DNA template. We'll use our sequences that we've generated so far. In the context of DNA replication, we generally think in terms of 'template' and 'newly-synthesized' strands (as above); in the context of transcription, however, we think in terms of 'sense' and 'anti-sense' strands. Note that these are two different ideas! A 'template' strand in the case of DNA replication could be either a 'sense' or 'anti-sense' strand in the case of transcription, and vice versa. In this case, we'll say that the strand we've been calling 'template' in the DNA replication section is our 'anti-sense' strand when it comes to transcribing this gene. So that means the 'synthesized' strand is now our 'sense' strand. Let's go ahead and define new variables to reflect this. antisense5_3 = template5_3; antisense3_5 = template3_5; sense5_3 = synth5_3; Question: What cell machinery is responsible for transcription? Answer: Question: We'll assume that the en!lre BN' sequence !Ha! we've Been ookmg at is going to be transcribed into Question: Should we use the 'flip' command to get it in the standard 5'->3' direction? Answer: (0D 30of7 , Transcription This section of code models transcription, the process of synthesizing RNA from a DNA template. We'll use our sequences that we've generated so far. In the context of DNA replication, we generally think in terms of 'template' and 'newly-synthesized' strands (as above); in the context of transcription, however, we think in terms of 'sense' and 'anti-sense' strands. Note that these are two different ideas! A 'template' strand in the case of DNA replication could be either a 'sense' or 'anti-sense' strand in the case of transcription, and vice versa. In this case, we'll say that the strand we've been calling 'template' in the DNA replication section is our 'anti-sense' strand when it comes to transcribing this gene. So that means the 'synthesized' strand is now our 'sense' strand. Let's go ahead and define new variables to reflect this. antisense5_3 = template5_3; antisense3_5 = template3_5; sense5_3 = synth5_3; Question: What cell machinery is responsible for transcription? Answer: Question: We'll assume that the entire DNA sequence that we've been looking at is going to be transcribed into mRNA. If this is true, what region of the gene must be 5' (upstream) of the sequence we've been looking at? Answer: Now we'll write code that mimics the function of this transcription machinery. Below is a matrix that gives the base-pairing rules followed for transcription. The letter in the first column gives the base of the DNA strand; the letter in the second column gives the corresponding base of the RNA strand. Now initialize a variable that will be used to store the sequence of the newly-synthesized RNA strand. Question: Since we're synthesizing this strand in the same order/direction as the transcription machinery, should we name this variable RNA3_5 or RNA5_3? Explain your choice. Answer: [1; % The for loop below reads each base on the DNA strand and chooses the % appropriate base to be added to the RNA strand being synthesized, but % it's incomplete. Fill in the gaps to complete the loop. for i = 1:length(template3_5) % Pull the ith base of the DNA strand. We want to synthesize our % virtual RNA strand 5'->3', just like a cell does. So which is the % appropriate DNA strand to use here? Should we use antisense5_3, % antisense3_5, or sense5_37 Hint: use matrix 'B' to help make your % decision. DNA_base = ; row_B = strcmp(DNA_base,B(:,1)); % identify the row in matrix B corresponding to templateBase; note that the variable row_B is a boolean vector % Define the synthesized base using matrix 'B' and vector 'row_B' RNA_base = ; % Store synth_base in the ith element of your storage vector. Again, % we'll need to use 'char' [1; end % Again, use the MATLAB command 'char' to convert the vector into a % character string. 5 A ; The for loop below reads each base on the DNA strand and chooses the % appropriate base to be added to the RNA strand being synthesized, but it's incomplete. Fill in the gaps to complete the loop. ".'template3_5) 8 4 of 7 3 % Pull the ith base of the DNA strand. We want to synthesize our % virtual RNA strand 5'->3', just like a cell does. So which is the % appropriate DNA strand to use here? Should we use antisense5_3, decision. antisense3_5, or sense5_3? Hint: use matrix 'B' to help make your DNA_base = ; row_B = stromp (DNA_base, B( : , 1) ); % identify the row in matrix B corresponding to templateBase; note that the variable row_B is a boolean vector Define the synthesized base using matrix 'B' and vector 'row_B' RNA_base = ; % Store synth_base in the ith element of your storage vector. Again, " we'll need to use 'char' ; end % Again, use the MATLAB command ' char' to convert the vector into a % character string. ; Translation This section of code models translation, the process of synthesizing a polypeptide from an RNA template. Question: What cell machinery is responsible for translation? Answer: Now we'll write code that mimics the function of this translation machinery. Below is a matrix that gives the amino acid-codon rules followed for translation. The letters in the first column gives the codon of the RNA strand; the letter in the second column gives the corresponding single letter amino acid code. Note that asterisks "* are used to denote stop codons. C = {'uuu' 'F' uuc' 'F' ' uua' 'L' ' uug' 'L' ' cuu' 'L' ' cuc' 'L' ' cua' 'L' ' cug' 'L' ' auu' 'I' 'auc' 'I' A ' aua' 'I ' aug ' 'guu' 'V. 'guc' 'V. ' gua' 'V' 'gug' 'V' ' ucu' ' ucc' 'uca' 'S' 'ucg' 'ccu' 'p 'ccc' 'p. 'cca' 'ccg' 'p' 'acu' acc' 'T' 'aca' 'acg' 'T' 'gcu' 'A' 'gcc' 'A' 'gca' 'A' 'gcg' ' "uau'asterisks "" are used to denote stop codons. C = {'uuu' 'F 'uuc' 'una' 'L' ' uug' 'L' 8 5 of 7 cuy L auu' 'I' 'auc' 'I' 4 ' aua' 'I' ' aug' 'M' 'guu' 'V. 'guc' 'V. 'gua' 'VI 'gug' 'V. ' ucu' 's' 'ucc' 'S' 'uca' 'S' 'ucg' 'S' 'ccu' 'ccc' 'p' 'cca' 'p ' ccg' ' acu' ' acc' ' aca' 'acg' 'gcu' 'A' 'gcc' ' A ' 'gca' 'A' 'gcg' 'A' ' uau' 'y 'uac' ly 'uaa' '* ' ' uag' '*' cau' 'H' 'cac' 'H' caa' '0' ' cag' 'Q' ' aau' 'N' 'aac' 'N' aaa' 'K' ' aag' 'K' ' gau' 'D' 'gac' 'D' ' gaa' 'E' 'gag' 'E' ' ugu ' 'ugc' ' uga' '* 1 ' ugg' 'W' ' cgu' 'R' 'cgc ' 'R' ' cga' 'R' 'cgg' 'R' 'agu' ' agc' 'S' 'aga' 'R' ' agg' 'R' 'ggu' 'G' 'ggc' 'G' 'gga' 'G' ggg' 'G'); The start codon for this transcript is at positions [60 61 62]. Double check that you see a start codon at those positions in the string variable RNA5_3 - if you do, you're probably on the right track so far! Note that the sequence before the start codon is still part of the mRNA transcript, but it's not translated. Question: What name do we give to this part of the gene? Answer: % codon. % Let's cut off the beginning of the sequence so we start at the start ORF = ;' uga' '*' ' ugg' 'W' cgu' 'R' cgc' 'R' 'cga' 'R' ' cgg' 'R' 8 6 of 7 'ggu' 'G' 'ggc' 'G' 'gga' 'G' 'ggg' 'G'); The start codon for this transcript is at positions [60 61 62]. Double check that you see a start codon at those positions in the string variable RNA5_3 - if you do, you're probably on the right track so far! Note that the sequence before the start codon is still part of the mRNA transcript, but it's not translated. Question: What name do we give to this part of the gene? Answer: Let's cut off the beginning of the sequence so we start at the start % codon. ORF = ; The while loop below reads each base on the DNA strand and chooses the appropriate amino acid to be added to the synthesized polypeptide, but it's incomplete. Fill in the gaps to complete the loop. NOTE: if you get to this line of code and it takes more than a few seconds to run, there's probably a bug in the while loop. Use ctri+C to stop the code and try to troubleshoot. AA_seq = NaN; AA_pos = 1; ORF_pos = 1:3; while ~stromp(char (AA_seq(end) ), '*' ) % continue until we've reached a stop codon % Use the indexing variable ORF_pos to grab the codon from ORF codon = ; % Use the codon to identify the corresponding amino acid using 6 reference table row_C = stromp( codon, C( : , 1) ) ; % Define the synthesized base using matrix 'C' and vector 'row_C' AA = ; % Store AA in the AA_posth element of the storage vector AA_seq. You'll % need to use the 'char' command here. ; Update the AA_pos and ORF_pos for the next codon. Think about what these values should be, and check that they match your expectation for a couple iterations of the while loop. AA_pos = ; ORF_pos = ; end % Again, use the MATLAB command 'char' to convert the vector AA_seq into a character string. ; Use the sequence AA to identify the C- and N-terminus amino acids: Answer: C-terminus: N-terminus: You may notice that while loop stopped before it got to the end of the mRNA transcript, meaning this part of the transcript wasn't translated. Question: What name do we give to this region of the gene? Answer:The while loop below reads each base on the DNA strand and chooses the appropriate amino acid to be added to the synthesized polypeptide, but it's incomplete. Fill in the gaps to complete the loop. NOTE: if you get to this line of code and it takes more than a few seconds to run, there's probably a bug in the while loop. Use ctri+C to stop the code and try to troubleshoot. AA_seq = NaN; 8 7 of 7 r(AA_seq(end) ), '*' ) % continue until we've reached a stop coaon % Use the indexing variable ORF_pos to grab the codon from ORF codon = ; % Use the codon to identify the corresponding amino acid using % reference table row_C = stremp ( codon, C( : , 1) ) ; % Define the synthesized base using matrix 'C' and vector 'row_C' AA = ; % Store AA in the AA_posth element of the storage vector AA_seq. You'll " need to use the 'char' command here. ; % Update the AA_pos and ORF_pos for the next codon. Think about what % these values should be, and check that they match your expectation % for a couple iterations of the while loop. AA_pos = ; ORF_pos = ; end % Again, use the MATLAB command 'char' to convert the vector AA_seq into a % character string. ; Use the sequence AA to identify the C- and N-terminus amino acids: Answer: C-terminus: N-terminus: You may notice that while loop stopped before it got to the end of the mRNA transcript, meaning this part of the transcript wasn't translated. Question: What name do we give to this region of the gene? Answer: Congratulations! You have finished HW#4. Please submit your code as a PDF and a .mix file. An easy way to do this is to click on Live Edition --- Export --- Export to PDF. Please name your PDF with the following format: Last Name_First Initial_BME 201_HW5. 7BME 201 Homework 5 Due on Canvas on Wednesday 2/21/24 On Canvas you will find a file named "BME201_HW5_Spring2024.mix". It is broken up into three sections, each one modeling a different aspect of molecular biology: DNA replication, transcription, and translation. Follow the instructions in the code to analyze the DNA sequence from the human insulin gene (found in the file insulinDNAseq.txt). Throughout the code, there are lines that require you to answer questions and lines that require you to modify/add your own code (denoted by square brackets ). A couple of notes: 1.) If the code takes more than 3 seconds to run, you're probably stuck in an infinite loop. Use ctrl+C to stop the code if this happens. I highly recommend you Google how to use breakpoints in MATLAB to make sure you only run the code up until the point you care about. If you let it run beyond the lines you've worked on, you'll almost definitely run into problems. It also might be helpful to use the 'Run Section' button (which just runs the current section of code) rather than 'Run'. 2.) Troubleshooting hint: If you're running into problems, copy and paste each line in order in the command window. Make sure each line is doing what you want it to do. If you're troubleshooting a for loop, first type i=1 in the command window. Use the schematics below to help interpret the input and output for each section. Hint: use the sequences below to confirm that your output looks correct for each step! 1. DNA Replication 3' toggga. ... ...tggtog 5'
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
Students Have Also Explored These Related Law Questions!