Question: Write a matlab code named HW7.m which takes a collection of k-mer reads and outputs the assembled genome. Codes referenced provided below reads.mat 'TGTTTA' 'TGGTTT'

Write a matlab code named "HW7.m" which takes a collection of k-mer reads and outputs the assembled genome.

Codes referenced provided below

Write a matlab code named "HW7.m" which takes a collection of k-mer

reads.mat

'TGTTTA'

'TGGTTT'

'TTTAAT'

'TTTTGG'

'TTTTTT'

'CATGCC'

'CAATTG'

'GCCCAA'

'AAAACG'

'TAATGG'

'GCCCCC'

'CCCCCT'

'GCCATG'

'GGTTTT'

'TTTTTG'

'CCATGC'

'TTTTGT'

'GGCCCC'

'TGCCCA'

'TGGGGA'

'CCCCTT'

'CCATGC'

'AATTGG'

'CATGCC'

'CCTTTT'

'CCCTTT'

'CCCAAT'

'TGGCCC'

'GAAAAC'

'CTTTTG'

'CCCCCC'

'GGCCCA'

'ATGCCC'

'TTGTTT'

'ATTGGT'

'TTTAAT'

'TGCCAT'

'TTGGGG'

'TGTTTA'

'GGGGAA'

'AATGGC'

'AATGGC'

'TTGGTT'

'ATGCCA'

'TTTGTT'

'GCCCAT'

'ATGGCC'

'GTTTAA'

'TTAATG'

'TAATGG'

'CCCATG'

'GTTTAA'

'CCAATT'

'TTTGGG'

'TTAATG'

'GTTTTT'

'GGGAAA'

'ATGGCC'

'GGAAAA'

'TGGCCC'

PatternToNumber code:

function number = PatternToNumber(pattern)

number = 0;

CharacterMap = containers.Map({'A', 'C', 'G', 'T'}, [0, 1, 2, 3]);

for i = 1: length(pattern)

number = number * 4 + CharacterMap(pattern(i));

end

number = 1 + number;

end

FindEulerianPath code:

function [path] = FindEulerianPath(adj,n) indegree = zeros(n);%for storing indegree of all vertices outdegree = zeros(n);%for storing outdegree of all vertices %calculating outdegree for i=1:n for j=1:n outdegree(i) = outdegree(i) + adj(i,j); end end %calculating indegree for j=1:n for i=1:n indegree(j) = indegree(j) + adj(i,j); end end

v1 = -1; v2 = -1; %step-1 ,search vertex whose outdegree-indegree = 1 for i=1:n if outdegree(i)-indegree(i) == 1 v1 = i; end end %search vertex whose outdegree-indegree = -1 for i=1:n if outdegree(i)-indegree(i) == -1 v2 = i; end end j = 1; %check if all others have balanced degree for i=1:n if i == v1 || i==v2 continue end if outdegree(i)~=indegree(i) j = 0; break; end end %if not balanced, no eulerian path if j == 0 fprintf('no eulerian path... '); return; end %if no such vertices no eulerian path if(v1 == -1 || v2 == -1) fprintf('no eulerian path... '); return; end %stack for path finding stack = java.util.Stack(); path = []; stack.push(v1);%inserting v1 while 1 if stack.isEmpty() break; end i = stack.pop(); path = [i,path]; for j=1:n if adj(i,j) == 1 stack.push(j); adj(i,j) = 0; end end end path = fliplr(path);%reversing the list to get the real path end

NumberToPattern code:

function pattern= NumberToPattern(number, k)

number = number - 1;

CharacterMap = ['A', 'C', 'G', 'T'];

pattern = ' ';

for i = 1: k

n = mod(number, 4);

number = (number - n) / 4;

pattern = strcat(CharacterMap(n + 1), pattern);

end

end

Section 1 of the code: Make an adjacency matrix (Al of the graph from the reads Hint: The uploaded "reads.mat file consists of sixty 6-mers. Each 6-mer represents an edge where the beginning node is the prefix and the ending node is the suffix of the 6-mer. Prefix and suffix have length of 5. So the graph can have at most 4 nodes, and therefore you can define A as a matrix with 4 rows and 4 columns where all its elements are zero at first. Then all you have to do is Start reading each k-mer, find its prefix and suffix, convert each to mber between 0 and 4-1 (use your PatternToNumber code from HW2). This way you can update the value of matrix A accordingly. For example, the first 6-mer is TGTTTA So preix TGTTT-959 and suffix='GTTA'=764. This means you need to add 1 to the element of the matrix A that sits on the 960" row and 765h column. Section 2 of the code: Find an Eulerian path for your adjacency matrix. Hint: Use you FindEulerianpath code from HW#6 Subtract 1 from each node number on the path to have a number between 0 and 41 again. For example, if Path- 1960 765... The corrected numbers should be 1959 764.. Section 3 of the code: Assemble the genome from the corrected path numbers Hint: Convert each corrected path number to a string of 5 nucleotides (use your NumberTo Pattern code from HW2). Take the whole 5 nucleotides from the first node, but for the rest of the nodes, just add the last nucleotide from their converted 5-mer strings Section 1 of the code: Make an adjacency matrix (Al of the graph from the reads Hint: The uploaded "reads.mat file consists of sixty 6-mers. Each 6-mer represents an edge where the beginning node is the prefix and the ending node is the suffix of the 6-mer. Prefix and suffix have length of 5. So the graph can have at most 4 nodes, and therefore you can define A as a matrix with 4 rows and 4 columns where all its elements are zero at first. Then all you have to do is Start reading each k-mer, find its prefix and suffix, convert each to mber between 0 and 4-1 (use your PatternToNumber code from HW2). This way you can update the value of matrix A accordingly. For example, the first 6-mer is TGTTTA So preix TGTTT-959 and suffix='GTTA'=764. This means you need to add 1 to the element of the matrix A that sits on the 960" row and 765h column. Section 2 of the code: Find an Eulerian path for your adjacency matrix. Hint: Use you FindEulerianpath code from HW#6 Subtract 1 from each node number on the path to have a number between 0 and 41 again. For example, if Path- 1960 765... The corrected numbers should be 1959 764.. Section 3 of the code: Assemble the genome from the corrected path numbers Hint: Convert each corrected path number to a string of 5 nucleotides (use your NumberTo Pattern code from HW2). Take the whole 5 nucleotides from the first node, but for the rest of the nodes, just add the last nucleotide from their converted 5-mer strings

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!