Question: Problem 2 (25 points): Open reading frame finder. An Opending Reading Frame (ORF) is a continuous stretch of codons (nu- cleotide triplets) that contain a

  1. Problem 2 (25 points): Open reading frame finder.

    An Opending Reading Frame (ORF) is a continuous stretch of codons (nu- cleotide triplets) that contain a start codon (i.e., ATG) at the beginning and a stop codon (i.e., TAA, TAG or TGA) at the end only, i.e., with no stop codon in the middle (https://en.wikipedia.org/wiki/Open reading frame). Note that there are three different ways (frames) that you can convert a DNA sequence into triplets, each shifting one nucleotide from another.

    1

Problem 2.1 (5 points): Reverse complementary strand.

Write a MATLAB function getReverseComp that takes a string as a DNA se- quence and return its reverse complementary strand. Save as file getReverseC- omp.m.

Test it on the command line by typing in: getReverseComp(ACGTGCA) or run the corresponding cell in hw1q2script.m.

Problem 2.2 (2 points): Identify possible start codons.

Write a MATLAB function findStartCodon that takes a string as a DNA se- quence, and returns the indices of all possible start codons. You may need the function strfind (type help strfind on matlab command line for usage.) Save as file findStartCodon.m.

Test it on the command line by typing in: findStartCodon(AATGTATGA) or run the corresponding cell in hw1q2script.m.

Problem 2.3 (3 points): Identify possible stop codons.

Write a MATLAB function findStopCodon that takes a string as a DNA se- quence, and returns the indices of all possible stop codons. The indices should be sorted. Save as file findStopCodon.m.

Test it on the command line by typing in: findStartCodon(ATAAGTAGGA) or run the corresponding cell in hw1q2script.m.

Problem 2.4 (15 points) Identify the longest open reading frames(ORF)

Write a MATLAB function that takes as input a DNA sequence, and returns the longest ORF, which is described as two numbers (the start index of the start codon, and the END index of the stop codon). Save as file findLongestORF.m.

To test your function on the command line, type in: findLongestORF(GGAGGCGTAAAATGCGTACTGGTAATGCAAACTAATGG) or run the corresponding cell in hw1q2script.m.

Test

To Test the functions, use the attached HW1q2script.m script. It reads a se- quence from a sequence file sequence.fa, in FASTA format (which is one of the most popular and simplest format) and tests each of the functions above and output some statistics.

HERE THIS THE CODE GIVEN, PLEASE COMPLETE USING MATLAB. THANK YOU!

% This script tests all functions required in HW1.q2

%%

getReverseComp('ACGTGCA')

%%

findStartCodon('AATGTATGA')

%%

findStopCodon('ATAAGTAGGA')

%%

findLongestORF('GGAGGCGTAAAATGCGTACTGGTAATGCAAACTAATGG')

% the correct ORF starts at index 12 (for ATG at 12:14, and ends at 35, for

% TAA at 33:35.

%%

seq=fastaread('sequence.fa');

dna=seq.Sequence;

disp(['seq header: ', seq.Header])

str=sprintf('Base frequency on + strand: A %d C %d G %d T %d ', baseFreq(dna));

disp(str);

dna2=getReverseComp(dna);

str=sprintf('Base frequency on - strand: A %d C %d G %d T %d ', baseFreq(dna2));

disp(str);

%%

disp(sprintf('Number of possible start codons on + strand: %d ', length(findStartCodon(dna))))

disp(sprintf('Number of possible stop codons on + strand: %d ', length(findStopCodon(dna))))

disp(sprintf('Number of possible start codons on - strand: %d ', length(findStartCodon(dna2))))

disp(sprintf('Number of possible stop codons on - strand: %d ', length(findStopCodon(dna2))))

%%

%%

orf_pos = findLongestORF(dna);

orf_neg = findLongestORF(dna2);

disp('Longest ORF on + strand:')

disp(orf_pos);

disp('Longest ORF on - strand:')

disp(orf_neg);

%% in-script function to calc the frequency of ACGT.

function freq = baseFreq(dna)

bases = 'ACGT';

for i = 1:length(bases)

freq(i) = length(strfind(dna, bases(i)));

end

% converts to fraction

%freq = freq / sum(freq);

end

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!