Question: Inputs are the oriC_nl.txt, and outputs are the command needed for each question. There are no more information to give, the assignment looks exactly like

Inputs are the oriC_nl.txt, and outputs are the command needed for each question.

There are no more information to give, the assignment looks exactly like the wording as shown.

write Python command for each question a,b,c,d

A typical kind of sequence question might be: In prokaryotes DnaA is a protein that activates initiation of DNA replication. There are multiple DnaA binding sites and they are typically 9-basepair repeats upstream of the oriC site. There is also a DNA Unwinding Element (DUE), which is a tandem array of three 13-basepair AT-rich sequences. Given the oriC sequence of a prokaryote, find potential DnaA boxes and the DUE.

The file oriC_nl.txt contains the 540 bases in the oriC region of Vibrio cholera broken up in 20bp chunks, with newline characters at the end of each line. Write programs to do the following:

a. Reverse Complement:

Input: The oriC_nl.txt file

Output: The complementary strand, written both 5' to 3' and 3' to 5'

Bonus: Use dictionaries and the get function.

b. Sequence Frequency: Find the most frequent k-mers in a string.

Input: The file oriC_nl.txt and an integer k

Output: The most frequent k-mers in the input file

c. Pattern Matching: Find all occurrences of a pattern in a string.

Input: The file oriC_nl.txt and a Pattern string

Output: All starting positions where Pattern appears as a substring in the file.

d. Sequence Frequency with Gaps: Find the most frequent k-mers in a string with one allowed mismatch.

Input: The file oriC_nl.txt and an integer k

Output: The most frequent k-mer consensus sequences in the input file

The oriC region of Vibrio cholera: -> oriC_nl.txt

atcaatgatc aacgtaagct tctaagcatg atcaaggtgc tcacacagtt tatccacaac ctgagtggat gacatcaaga taggtcgttg tatctccttc ctctcgtact ctcatgacca cggaaagatg atcaagagag gatgatttct tggccatatc gcaatgaata cttgtgactt gtgcttccaa ttgacatctt cagcgccata ttgcgctggc caaggtgacg gagcgggatt acgaaagcat gatcatggct gttgttctgt ttatcttgtt ttgactgaga cttgttagga tagacggttt ttcatcactg actagccaaa gccttactct gcctgacatc gaccgtaaat tgataatgaa tttacatgct tccgcgacga tttacctctt gatcatcgat ccgattgaag atcttcaatt gttaattctc ttgcctcgac tcatagccat gatgagctct tgatcatgtt tccttaaccc tctatttttt acggaagaat gatcaagctg ctgctcttga tcatcgtttcInputs are the oriC_nl.txt, and outputs are the command needed for each

main.py oriC_nl.tx sequence-file open('oric.nl.txt','r') 2 int(raw_input ("Size k-mer to search for? ")) kmer num-mismatch of = int (raw_input("Number of mismatches tolerated in the kmer? ") = "" 6 sequence = 7-for line in sequence_file: sequence +-...join(line.split()).lower() 10 main.py oriC_nl.tx sequence-file open('oric.nl.txt','r') 2 int(raw_input ("Size k-mer to search for? ")) kmer num-mismatch of = int (raw_input("Number of mismatches tolerated in the kmer? ") = "" 6 sequence = 7-for line in sequence_file: sequence +-...join(line.split()).lower() 10

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!