Question: Create an ORF generating HMM ( 5 0 points ) Create a THREE - STATE HMM that can simulate a sequence which contains Open Reading
Create an ORF generating HMM points
Create a THREESTATE HMM that can simulate a sequence which contains Open Reading FramesORFs It should run for a fixed number of emissions, x during every run of the model. This model should be able to generate ANY possible reading frame with nonzero probability If an ORF ends before reaching x emissions your algorithm should continue the sequence by starting a new ORF. Your model should include the following parameters for which you need to make up reasonable values:
k: a by matrix defining the probability of starting in each of your states,
A: A x matrix containing the transition probabilities akl which is the probility of transitioning from state k to state l
Ek: one matrix per state representing probabilities for all possible emissions from this state.
x the total number of emissions that you would like your model to emit per run.
Hint: What are things that every ORF has?
Draw the HMM by hand is fine and justify your parameter values points
a Model is drawn with Ek and A values shown on the model. Within the bounds of your parameter x all possible ORFs can be generated by your model. points
b Reasonable probability values are chosen and an explanation accompanies the choice. points
Code the model in Python Use it to generate random sequences containing ORFs Measure the lengths of the ORFs you generated and draw a distribution representing your result. Submit your code as yourNameORFHMM.py submit the output of your code in yournameORFoutput.txt and include the distribution in your pdf
i Code correctly represents your HMM from question points
ii ORF distribution is submitted points
iii. Is your distribution what you expected based on the model parameters youve chosen? Explain. points
Use ATG as the only possible start. The probability may be small for some ORFs, but all should be non for any ORF length of x where x is the total emissions of your model.
Its ok if your model generates a sequence that concludes with an incomplete ORF
For measuring the ORF lengths, use only the FIRST complete ORF in each sequence even if each sequence has more than one complete ORF.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
