Question: Problem 5 of Chapter 2 considers haplotype frequency estimation for two linked, biallelic loci. The EM algorithm discussed there relies on the allele-counting estimates pA,

Problem 5 of Chapter 2 considers haplotype frequency estimation for two linked, biallelic loci. The EM algorithm discussed there relies on the allele-counting estimates pA, pa, pB, and pb.

(a) Construct the Dirichlet prior from these estimates mentioned in Section 3.8 and devise an EM algorithm that maximizes the product of the prior and the likelihood of the observed data. In particular, show that the EM update for pAB is pm+1,AB = 2nAABB + nAABb + nAaBB + nmAB/ab + βAB 2n + β

nmAB/ab = nAaBb 2pmABpmab 2pmABpmab + 2pmAbpmaB

, where βAB = αAB −1 and β = α − 4.

There are similar updates for pAb, paB, and pab. (Hint: The log prior passes untouched through the conditional expectation of the E step of the EM algorithm.)

(b) Implement this EM algorithm on the mosquito data given in Table 2.5 of Chapter 2 for the value α − 4 = 10 and starting from the estimated linkage equilibrium frequencies. You should find that ˆpAB = .717, ˆpAb = .083, ˆpaB = .121, and ˆpab = .079.

(c) Describe how you would generalize the algorithm to more than two loci and more than two alleles per locus.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Probability And Stochastic Modeling Questions!