Question: Homework 5 : HMMs And Models ( 1 0 0 points total ) Assignment guidelines Submit your assignment files on canvas under module 1 1
Homework : HMMs And Models
points total
Assignment guidelines
Submit your assignment files on canvas under module : COVID and Ancient Genomes
Please submit your code in files called nameBy Your code should be easy to open in
a text editor so that someone can download and use the function you write.
Please submit a pdf with the answers to the questions at the bottom of the assignment
and your visualizations
Please submit a text file with the output of your code
Complete the class assignment: Nucleotide Composition HMM points
Train the model on the files labled as "training". Test the model on the Files labled as pathogen
and Sgacil, In your output include a classification and a score for each sequence in both files.
Also, include a plot containing the score distributions of scores for pathogen and Sacil
sequences.
Complete the functions getLogLike and trainMedel points
a Code meets specifications the function exists and makes correct input and
output
i Code includes getLoglike, I function which computes the correct log
liklibeod. points
ii Code includes traioModeld function which takes in the sequences of
pathogen and Spaciland measures model parameters points
Draw by hand is fine the HMM that represents our pathogen model. Label all the
transition states and indicate the emissions for each state points
Imagine that, due to incidental overlaps, when you assembled the Sqacii genome you
combined a bit of the parasite genome and your Spacilgenome into a single contig.
Describe how you would combine your models into a single HMM and use dynamic
programming to identify a likely merge point between the Spacil and parasite sequences
dynamique programming programming can be used to identify the most likely merge point.
This involves finding the best path through the combined HMM that
separates parasite and Spacil sequences.
#Basecount.py:
import math
import matplotlib.pyplot as plt #if you don't have matplotolib installed this line gives you an error
#you can comment out the line that starts with
plt and plot a histogram with the data output
baseIDx A:C:G:T:
def main:
spaciiFA "MSpacii.fa
pathogenFA "pathogen.fa
spaciiFAT "MSpaciitraining.fa
pathogenFAT "pathogentraining.fa
spaciiIDseq getSeqspaciiFA
pathogenIDseq getSeqpathogenFA
spaciiTrainModel
pathTrainModel
spaciiTrainModel trainModelspaciiTrainModel spaciiFAT
pathTrainModel trainModelpathTrainModel pathogenFAT
markovScoresSpacii
markovScoresPath
for ID in spaciiIDseq.keys:
markovScoresSpacii.appendgetLogLikespaciiTrainModelpathTrainModel,spaciiIDseqI
D
for ID in pathogenIDseq.keys:
markovScoresPath.appendgetLogLikespaciiTrainModelpathTrainModel,pathogenIDseqI
D
####output
plthistmarkovScoresPath
markovScoresSpaciibinslabelpathogen'spacii'rwidthdensityTrue
scoresOutputTextmarkovScoresSpaciimarkovScoresPath
####output
def scoresOutputTextmarkovScoresSpaciimarkovScoresPath:
f openresultstab", w
fwriteSpaciiScorestpathogenScores
for i in range lenmarkovScoresSpacii:
fwritestrmarkovScoresSpaciiitstrmarkovScoresPathi
fclose
def getLogLikemodel modelseq: #takes in the two trained models and the
sequence that needs to be scored
Pmod
Pmod
#Please complete this function. This should return the loglikelihood of the
two models
#with Pmod and Pmod as the probabilities of the two models.
return score
def trainModelmodel data:
#Please complete this function. This should look at all the training data and
calculate how many
#dinucleotides preceed each base similar to what was outlined on the slides
#The ouput of the function should be a x matrix model where each row
represents the probability
#of seing base x given the previous base was yeach row should sum to
printmodel
return model
def getSeqfilename:
f openfilename
idseq
currkey
for line in f:
if line.find:
currkey line.rstrip:
idseqcurrkey
else:
idseqcurrkey idseqcurrkey line.rstrip
return idseq
main
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
