Question: Inputs are the oriC_nl.txt, and outputs are the command needed for each question. There are no more information to give, the assignment looks exactly like

Inputs are the oriC_nl.txt, and outputs are the command needed for each question.

There are no more information to give, the assignment looks exactly like the wording as shown.

write Python command for each question a,b,c,d

A typical kind of sequence question might be: In prokaryotes DnaA is a protein that activates initiation of DNA replication. There are multiple DnaA binding sites and they are typically 9-basepair repeats upstream of the oriC site. There is also a DNA Unwinding Element (DUE), which is a tandem array of three 13-basepair AT-rich sequences. Given the oriC sequence of a prokaryote, find potential DnaA boxes and the DUE.

The file oriC_nl.txt contains the 540 bases in the oriC region of Vibrio cholera broken up in 20bp chunks, with newline characters at the end of each line. Write programs to do the following:

a. Reverse Complement:

Input: The oriC_nl.txt file

Output: The complementary strand, written both 5' to 3' and 3' to 5'

Bonus: Use dictionaries and the get function.

b. Sequence Frequency: Find the most frequent k-mers in a string.

Input: The file oriC_nl.txt and an integer k

Output: The most frequent k-mers in the input file

c. Pattern Matching: Find all occurrences of a pattern in a string.

Input: The file oriC_nl.txt and a Pattern string

Output: All starting positions where Pattern appears as a substring in the file.

d. Sequence Frequency with Gaps: Find the most frequent k-mers in a string with one allowed mismatch.

Input: The file oriC_nl.txt and an integer k

Output: The most frequent k-mer consensus sequences in the input file

The oriC region of Vibrio cholera: -> oriC_nl.txt

atcaatgatc aacgtaagct tctaagcatg atcaaggtgc tcacacagtt tatccacaac ctgagtggat gacatcaaga taggtcgttg tatctccttc ctctcgtact ctcatgacca cggaaagatg atcaagagag gatgatttct tggccatatc gcaatgaata cttgtgactt gtgcttccaa ttgacatctt cagcgccata ttgcgctggc caaggtgacg gagcgggatt acgaaagcat gatcatggct gttgttctgt ttatcttgtt ttgactgaga cttgttagga tagacggttt ttcatcactg actagccaaa gccttactct gcctgacatc gaccgtaaat tgataatgaa tttacatgct tccgcgacga tttacctctt gatcatcgat ccgattgaag atcttcaatt gttaattctc ttgcctcgac tcatagccat gatgagctct tgatcatgtt tccttaaccc tctatttttt acggaagaat gatcaagctg ctgctcttga tcatcgtttc Inputs are the oriC_nl.txt, and outputs are the command needed for each

main.py oriC_nl.tx sequence-file open('oric.nl.txt','r') 2 int(raw_input ("Size k-mer to search for? ")) kmer num-mismatch of = int (raw_input("Number of mismatches tolerated in the kmer? ") = "" 6 sequence = 7-for line in sequence_file: sequence +-...join(line.split()).lower() 10 main.py oriC_nl.tx sequence-file open('oric.nl.txt','r') 2 int(raw_input ("Size k-mer to search for? ")) kmer num-mismatch of = int (raw_input("Number of mismatches tolerated in the kmer? ") = "" 6 sequence = 7-for line in sequence_file: sequence +-...join(line.split()).lower() 10

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

MATLAB ASSIGN 1- Follow Directions and Complete. Will Thumbs Up! Directions: Unless otherwise specified, your write-up should contain the MATLAB input commands, the corresponding output, and the...

In C++ please. Assignment 4 This assignment is the first in a sequence of three. It is not strictly necessary to complete this one in order to do the other two, but the understanding you gain in...

Weve implicitly assumed that each call to Compute next value requires roughly the same amount of work as the other calls. How would you change your answer to the preceding question if call i = k...

Advanced Linear Algebra / Advanced Math / Matlab question need help! Some of the needed codes are attached. In the question, it talks about the HW 6.1 but it can be neglected because every thing...

instructors.txt u000537 pw9836 Luke Palmer u000538 pw2972 Morgan Williams u000539 pw5815 Kian Bradley u000537 pw9836 Luke Palmer u000538 pw2972 Morgan Williams u000539 pw5815 Kian Bradley...

Integrity Checking Using CRC For this assignment you will write a CRC integrity checking program will have two modes of operation. Your program must implement the CRC calculation method involving...

************NEED HELP URGENT**************** *************IN C LANGUAGE****************** Integrity Checking Using CRC For this assignment you will write a CRC integrity checking program will have...

Modify your algorithm from Exercise 24.3-6 to run in O ((V + E) lg W ) time. (Hint: How many distinct shortest-path estimates can there be in V - S at any point in time?)

E-10 Zoran Corporation manufactures and sells a single product; cordless telephones. Zoran is considering upgrading its current manufacturing facilities with more modern equipment. Relevant cost data...

Waterway Company reported net income of $ 7 0 2 0 0 for the year. During the year, accounts receivable increased by $ 5 9 0 0 , accounts payable decreased by $ 4 9 0 0 and depreciation expense of $ 8...

Which of the following are problems with identifying users of ABC? Multiple select question. ABC means different things to different organizations. Organizations will announce the discontinuance of...

8. Demonstrate aspects of assessing group performance

3. Create a chart that lists the four leadership styles described in this chapter (directive, participative, supportive, and achievement oriented). Evaluate the leaders of each of the different...

1. LaunchPad for Real Communication offers key term videos and encourages selfassessment through adaptive quizzing. Go to bedfordstmartins.com/realcomm to get access to: LearningCurve Adaptive...