Question: Homework 5 : HMMs And Models ( 1 0 0 points total ) Assignment guidelines Submit your assignment files on canvas under module 1 1

Homework

5

: HMMs And Models

(100

points total

)

Assignment guidelines

Submit your assignment files on canvas under module

11

: COVID and Ancient Genomes

Please submit your code in file

(

)

called

[

name

] .

-

Your code should be easy to open in

a text editor so that someone can download and use the function you write.

Please submit a pdf with the answers to the questions at the bottom of the assignment

(

and your visualization

(

))

Please submit a text file with the output of your code

Complete the class assignment: Nucleotide Composition HMM

(50

points

)

Train the model on the files labled as "training". Test the model on the Files labled as pathogen

and Sgacil, In your output include a classification and a score for each sequence in both files.

Also, include a plot containing the score distributions of scores for pathogen and S

8

acil

sequences.

Complete the functions getLogLike and trainMedel

(30

points

)

.

Code meets specifications

-

the function exists and makes correct input and

output

.

Code includes getLoglike,

(

)

function which computes the correct log

-

liklibeod.

(15

points

)

.

Code includes traioModeld function which takes in the sequences of

pathogen and Spaciland measures model parameters

(15

points

)

Draw

(

by hand is fine

)

the HMM that represents our pathogen model. Label all the

transition states and indicate the emissions for each state

(10

points

)

Imagine that, due to incidental overlaps, when you assembled the Sqacii genome you

combined a bit of the parasite genome and your Spacilgenome into a single contig.

Describe how you would combine your models into a single HMM and use dynamic

programming to identify a likely merge point between the Spacil and parasite sequences

dynamique programming programming can be used to identify the most likely merge point.

This involves finding the best path through the combined HMM that

separates parasite and Spacil sequences.

#Basecount.py:

import math

import matplotlib.pyplot as plt #if you don't have matplotolib installed

(

this line gives you an error

)

#you can comment out the line that starts with

plt

.

and plot a histogram with the data output

baseIDx

= {"

"

0, "

"

1, "

"

2, "

"

3}

def main

()

spaciiFA

=

"MSpacii.fa

"

pathogenFA

=

"pathogen.fa

"

spaciiFA

_

=

"MSpacii

_

training.fa

"

pathogenFA

_

=

"pathogen

_

training.fa

"

spaciiID

2

seq

=

getSeq

(

spaciiFA

)

pathogenID

2

seq

=

getSeq

(

pathogenFA

)

spaciiTrainModel

= [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]

pathTrainModel

= [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]

spaciiTrainModel

=

trainModel

(

spaciiTrainModel

,

spaciiFA

_

)

pathTrainModel

=

trainModel

(

pathTrainModel

,

pathogenFA

_

)

markovScoresSpacii

= []

markovScoresPath

= []

for ID in spaciiID

2

seq.keys

()

markovScoresSpacii.append

(

getLogLike

(

spaciiTrainModel

,

pathTrainModel,spaciiID

2

seq

[

]))

for ID in pathogenID

2

seq.keys

()

markovScoresPath.append

(

getLogLike

(

spaciiTrainModel

,

pathTrainModel,pathogenID

2

seq

[

]))

####

- - - - - - - - - - - - - - - - - - - - - -

output

- - - - - - - - - - - - - - - - - - - - - - - - -

plt

.

hist

([

markovScoresPath

,

markovScoresSpacii

],

bins

= 20,

label

= ['

pathogen

',

'spacii'

],

rwidth

= 1,

density

=

True

)

scoresOutputText

(

markovScoresSpacii

,

markovScoresPath

)

####

- - - - - - - - - - - - - - - - - - - - - -

output

- - - - - - - - - - - - - - - - - - - - - - - - -

def scoresOutputText

(

markovScoresSpacii

,

markovScoresPath

)

=

open

("

results

.

tab",

"

")

.

write

("

SpaciiScores

\

tpathogenScores

")

for i in range

(

len

(

markovScoresSpacii

))

.

write

(

str

(

markovScoresSpacii

[

]) + " \

" +

str

(

markovScoresPath

[

]) + "

")

.

()

def getLogLike

(

model

1,

model

2,

seq

)

: #takes in the two trained models and the

sequence that needs to be scored

Pmod

1 = 1

Pmod

2 = 1

#Please complete this function. This should return the log

-

likelihood of the

two models

#with Pmod

1

and Pmod

2

as the probabilities of the two models.

return score

def trainModel

(

model

,

data

)

#Please complete this function. This should look at all the training data and

calculate how many

#dinucleotides preceed each base similar to what was outlined on the slides

#The ouput of the function should be a

4

4

matrix model where each row

represents the probability

#of seing base x given the previous base was y

. (

each row should sum to

1)

(

model

)

return model

def getSeq

(

filename

)

=

open

(

filename

)

2

seq

= {}

currkey

= " "

for line in f:

if line.find

(" > ") = = 0

currkey

=

line.rstrip

() [1

]

2

seq

[

currkey

] = " "

else:

2

seq

[

currkey

] =

2

seq

[

currkey

] +

line.rstrip

()

return id

2

seq

main

()

Homework 5 : HMMs And Models ( 1 0 0 points total

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

CS 3733 Operating Systems Assignment 3 Overview This assignment is on memory memory management, where we design a simulator that implements the OS's address translation mechanisms. Although an OS can...

MNG3701/101/3/2016 Tutorial Letter 101/3/2016 Strategic Planning MNG3701 Semesters 1 and 2 Department of Business Management IMPORTANT INFORMATION: Please activate your myUnisa and myLife email...

Can someone fix and check if my ".c" file and "Makefile" are correct? Need 2 things: ".c" file with the name "LastNameLKM.c" Makefile you use to build it My screenshots: Instructions: There is also a...

MNG3702/101/3/2016 Tutorial Letter 101/3/2016 Strategy Implementation and Control MNG3702 Semesters 1 and 2 Department of Business Management PLEASE NOTE: This tutorial letter contains important...

Module 9 Assignment: TOC Answer all the questions and submit your answer report to Module 9 Assignment in Dropbox by the deadline . The report should be typed, single spaced, in one MS Word file. You...

Program Description: This assignment will give you practice writing Java classes to a provided API and using Collections. You are to write a set of supporting classes for a simple shopping cart. The...

Need help with this Implementing a Loadable Kernel Module Summary: In this homework, you will be implement a loadable kernel module that uses Linux data structures to display details about the...

IT 625 Final Project Guidelines and Rubric Overview Note: In order to successfully complete this project, you will need to carefully review the Final Project Guidelines and Rubric document and the...

Good Morning, This is the 3rd homework assignment I am requesting of you as you have did excellent on the two prior which I greatly appreciate. This is a new course that is starting today and I am...

JAVA PROGRAM CritterMain code FLYTRAP code Food code Critter code You should upload the 4 new classes that you have created. You do not need to upload the files that were provided for you. Do not zip...

a. Why is an O--H stretch more intense than an N--H stretch? b. Why is the O--H stretch of a carboxylic acid broader than the O--H stretch of an alcohol?

How many reamers, each 20 cm long, can be cut from a bar 6 ft long, allowing 3 mm for each saw cut?

A bank that has an equity to asset ratio equal to 1 2 percent can normally lend no more than q , of its assets to any one borrower. 1 . 8 0 percent 1 5 . 0 0 percent 1 . 2 0 percent 1 2 . 0 0 percent

which dispute-resolution process produces a decision that is binding