Question: Create an ORF generating HMM ( 5 0 points ) Create a THREE - STATE HMM that can simulate a sequence which contains Open Reading

Create an ORF generating HMM (50 points)
Create a THREE-STATE HMM that can simulate a sequence which contains Open Reading Frames(ORFs). It should run for a fixed number of emissions, x, during every run of the model. This model should be able to generate ANY possible reading frame with non-zero probability*. If an ORF ends before reaching x emissions your algorithm should continue the sequence by starting a new ORF. Your model should include the following parameters for which you need to make up reasonable values:
k_0: a 3 by 1 matrix defining the probability of starting in each of your 3 states,
A: A 3x3 matrix containing the transition probabilities a_kl which is the probility of transitioning from state k to state l
E_k: one matrix per state representing probabilities for all possible emissions from this state.
x = the total number of emissions that you would like your model to emit per run.
Hint: What are 3 things that every ORF has?
1) Draw the HMM (by hand is fine) and justify your parameter values (25 points)
a. Model is drawn with E_k, and A values shown on the model. Within the bounds of your parameter x, all possible ORFs can be generated by your model. (15 points)
b. Reasonable probability values are chosen and an explanation accompanies the choice. (10 points)
2) Code the model in Python**. Use it to generate 250 random sequences containing ORFs***. Measure the lengths of the 250 ORFs you generated and draw a distribution representing your result. Submit your code as yourName_ORFHMM.py submit the output of your code in your_nameORFoutput.txt and include the distribution in your pdf.
i. Code correctly represents your HMM from question 1(10 points)
ii. ORF distribution is submitted (5 points)
iii. Is your distribution what you expected based on the model parameters youve chosen? Explain. (10 points)
* Use ATG as the only possible start. The probability may be small for some ORFs, but all should be non-0 for any ORF length of (x-1) where x is the total emissions of your model.
** Its ok if your model generates a sequence that concludes with an incomplete ORF
*** For measuring the ORF lengths, use only the *FIRST* complete ORF in each sequence even if each sequence has more than one complete ORF.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!