Question: In Python can we answer all aspects and with comments Dot Plot with Sliding Windows In class we looked at the simple dot plot algorithm.
In Python can we answer all aspects and with comments
Dot Plot with Sliding Windows In class we looked at the simple dot plot algorithm. We discussed how it can be difficult to see patterns in the plot when we are matching each character individually. The sliding windows version of the dot plot allows us to visualize patterns in groups of characters. In this lab you will implement the sliding windows algorithm for computing a dot plot.
I. Compute Match Percentage Lets start by writing a function that will compute the percentage of matches found between a pair of given sequences. Note, you cannot assume that the sequences are of equal length. The function skeleton will look like this:
def matchPct(seq1, seq2):
# Fill in the code to compute the percentage match # between the sequences
return pct
II. Working with Substrings
1) In order to implement the sliding windows, we will have to access substrings from a Python string. Python has a way to do this. If you want to take a substring from position 2 to up to but not including position 5 in a string, you can use:
myString[2:5]
Try this out. In the Python shell, enter the following: myString = UNIVERSITY
Now enter:
myString[2:5]
What does it return?
What command would you use if you wanted to return the match percentage between the first five characters of two given strings, str1 and str2? (Hint: Use the function that you created in Part I above)
Try this on these two strings:
str1 = AACTCGTGAGTCT
str2 = ACTTGCGGGCTA
What is the result of this command for these strings?
III. Compute the Dot Plot
Lets now practice the technique to compute a dot plot with sliding windows. We will use a small example to demonstrate how this will work:
0 1 2 3 4 5 6
A G G T A A G
0 A
1 A
2 G
3 T
4 A
If we assume a window size of 3 and a threshold of 0.5, then we put a dot in the dot plot in position (0,0) if we have better than a 50% match between the first three characters in the first sequence and the first three characters in the second sequence. In this example, we would be comparing:
AGG AAG
Since two of the three characters match, we would have a dot in cell (0,0).
0 1 2 3 4 5 6
A G G T A A G
0 A *
1 A
2 G
3 T
4 A
Next we consider cell (0,1) in the dot plot. We slide the window on the top sequence over by one so are comparing:
GGT
AAG
In this comparison, we have no matches, so there is no dot placed in cell (0,1). Fill in the rest of the first row of the dot plot above. For each of the cells in the first row, indicate here what two strings you are comparing:
Cell (0,2):
Cell (0,3):
Cell (0,4):
Cell (0,5):
Cell (0,6):
When filling in cells (0,5) and (0,6),
what problem do you encounter?
How can you address this problem?
IV. Complete the Program
You can find skeleton code for the dot plot with sliding windows program here:
http://www.cs.uri.edu/~cingiser/csc110/labs/dot_plot_sliding_skeleton.py
Add your code for the matchPct function to the skeleton code, and then fill in the computeDotPlot function to implement the technique described above.
Test the program using the following sequences with several different window sizes and different threshold values:
AAGGTAGCCTAACGTCCACTTTACCC
AGTAAGGTACCTACCTCAACTTCA
What do you observe about how the plot changes when you make the window larger?
What do you observer about how the plot changes when you make the threshold higher?
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
