Generate a digraph frequency matrix for English text, where is the count of the number of times

No answer yet for this question. Ask a Tutor

Question:

Generate a digraph frequency matrix for English text, where is the count of the number of times that letter is followed by letter . Here, we assume that a is letter 0, b is letter 1, c is letter 2, and so on. This matrix must be based on 1,000,000 characters where, as above, only the 26 letters of the alphabet are used. Next, add five to each element in your 26 26 matrix . Finally, normalize your matrix by dividing each element by its row sum. The resulting matrix will be row stochastic, and it will not contain any 0 probabilities. d) Train an HMM with = = 26, using the first 1000 characters of ciphertext you generated in part a), where the matrix is initialized with your matrix from part c). Also, in your HMM, do not reestimate . Use the final matrix to determine a putative key and give the fraction of putative key elements that match the actual key (as a decimal, to four places). For example, if 22 of the 26 key positions are correct, then your answer would be 22/26 = 0.8462. 12. Write an HMM program to solve the problem discussed in Section 9.2, replacing English text with the following. a) French text. b) Russian text. c) Chinese text. 2.10 PROBLEMS 33 13. Perform an HMM analysis similar to that discussed in Section 9.2, replacing English with "Hamptonese," the mysterious writing system developed by James Hampton. For information on Hamptonese, see http://www.cs.sjsu.edu/faculty/stamp/Hampton/hampton.html 14. Since HMM training is a hill climb, we are only assured of reaching a local maximum. And, as with any hill climb, the specific local maximum that we find will depend on our choice of initial values. Therefore, by training a hidden Markov model multiple times with different initial values, we would expect to obtain better results than when training only once. In the paper [16], the authors use an expectation maximization (EM) approach with multiple random restarts as a means of attacking homophonic substitution ciphers. An analogous HMM-based technique is analyzed in the report [158], where the effectiveness of multiple random restarts on simple substitution cryptanalysis is explored in detail. Multiple random restarts are especially helpful in the most challenging cases, that is, when little data (i.e., ciphertext) is available. However, the tradeoff is that the work factor can be high, since the number of restarts required may be very large (millions of random restarts are required in some cases). a) Obtain an English plaintext message consisting of 1000 plaintext characters, consisting only of lower case a through z (i.e., remove all punctuation, special characters, and spaces, and convert all upper case letters to lower case). Encrypt this plaintext using a randomly selected shift of the alphabet. Remember the key. Also generate a digraph frequency matrix , as discussed in part c) of Problem 11. b) Train HMMs, for each of = 1, = 10, = 100, and = 1000, following the same process as in Problem 11, part d), but using the = 1000 observations generated in part a) of this problem. For a given select the best result based on the model scores and give the fraction of the putative key that is correct, calculated as in Problem 11, part d). c) Repeat part b), but only use the first = 400 observations. d) Repeat part c), but only use the first = 300 observations.

Describe how to modify LeyzorekAPSP to return the correct shortestpath distances, even if the graph has negative cycles. (c) Describe how to modify Floyd-Warshall to return the correct shortest-path distances, even if the graph has negative cycles. 3. The algorithms described in this chapter can also be modified to return an explicit description of some negative cycle in the input graph G, if one exists, instead of only reporting whether or not G contains a negative cycle. (a) Describe how to modify Johnson's algorithm to return either the array of all shortest-path distances or a negative cycle. (b) Describe how to modify LeyzorekAPSP to return either the array of all shortest-path distances or a negative cycle. (c) Describe how to modify Floyd-Warshall to return either the array Our fast dynamic programming algorithm is still a factor of O(log V) slower in the worst case than the standard implementation of Johnson's algorithm. A different formulation of shortest paths that removes this logarithmic factor was proposed twice in 1962, first by Robert Floyd and later independently by Peter Ingerman, both slightly generalizing an algorithm of Stephen Warshall published earlier in the same year. In fact, Warshall's algorithm was previously discovered by Bernard Roy in 1959, and the underlying recursion pattern was used by Stephen Kleene8 in 1951. Warshall's (and Roy's and Kleene's) insight was to use a different third parameter in the dynamic programming recurrence. Instead of considering paths with a limited number of edges, they considered paths that can pass through only certain vertices. Here, "pass through" means "both enter and leave"; for example, the path wxyz starts at w, passes through x and y, and ends at z. Number the vertices arbitrarily from 1 to V. For every pair of vertices u and v and every integer r, we define a path ?(u, v,r) as follows: ?(u, v,r) is the shortest path (if any) from u to v that passes through only vertices numbered at most r.