Question: Fill in the TODO blanks in ModelMatcher.java and MatcherController.java //MarkovModel,java import java.util.Set; /** * Construct a Markov model of order /k/ based on an input
Fill in the TODO blanks in ModelMatcher.java and MatcherController.java


//MarkovModel,java
import java.util.Set; /** * Construct a Markov model of order /k/ based on an input string. * * @author * @version */ public class MarkovModel {
/** Markov model order parameter */ int k; /** ngram model of order k */ NgramAnalyser ngram; /** ngram model of order k+1 */ NgramAnalyser n1gram;
/** * Construct an order-k Markov model from string s * @param k int order of the Markov model * @param s String input to be modelled */ public MarkovModel(int k, String s) { ngram = new NgramAnalyser(k, s); n1gram = new NgramAnalyser((k+1), s); }
/** * @return order of this Markov model */ public int getK() { return k; }
/** Estimate the probability of a sequence appearing in the text * using simple estimate of freq seq / frequency front(seq). * @param sequence String of length k+1 * @return double probability of the last letter occuring in the * context of the first ones or 0 if front(seq) does not occur. */ public double simpleEstimate(String sequence) { double prob; String seqNotLast = sequence.substring(0, sequence.length()-1);
if (ngram.getDistinctNgrams().contains(seqNotLast)) { double n1g = n1gram.getNgramFrequency(sequence); double ng = ngram.getNgramFrequency(seqNotLast); try{ prob = (n1gg); } catch(ArithmeticException e){ return 0.0; } return prob; } else { return 0.0; }
} /** * Calculate the Laplacian probability of string obs given this Markov model * @input sequence String of length k+1 */ public double laplaceEstimate(String sequence) { String context = sequence.substring(0, sequence.length()-1); double npc = n1gram.getNgramFrequency(sequence); double np = ngram.getNgramFrequency(context); double laplace; laplace = (npc + 1)/(np + ngram.getAlphabetSize()); return laplace; }
/** * @return String representing this Markov model */ public String toString() { String toRet = ""; String k = Integer.toString(getK()); toRet += (k + " "); toRet += (Integer.toString(ngram.getAlphabetSize()) + " "); toRet += ngram.toString() + n1gram.toString(); return toRet; }
}
--------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------
//ModelMatcher.java
import java.util.HashMap; import java.util.Collection; import java.util.ArrayList; import java.util.Arrays;
/** * Report the average log likelihood of a test String occuring in a * given Markov model and detail the calculated values behind this statistic. * * @author * @version */ public class ModelMatcher {
/** log likelihoods for a teststring under a given model */ private HashMap
/** Helper method that calculates the average log likelihood statistic * given a HashMap of strings and their Laplace probabilities * and the total number of ngrams in the model. * * @param logs map of ngram strings and their log likelihood * @param ngramCount int number of ngrams in the original test string * @return average log likelihood: the total of loglikelihoods * divided by the ngramCount */ private double averageLogLikelihood(HashMap
/** * @return the average log likelihood statistic */ public double getAverageLogLikelihood() { return averageLogLikelihood; } /** * @return the log likelihood value for a given ngram from the input string */ public double getLogLikelihood(String ngram) { return (logLikelihoodMap.get(ngram)); } /** * Make a String summarising the log likelihood map and its statistics * @return String of ngrams and their loglikeihood differences between the models * The likelihood table should be ordered from highest to lowest likelihood */ public String toString() { //TODO return null; }
}
--------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------

//MatcherController.java
import java.io.File; import java.util.ArrayList; import java.util.HashMap; import java.util.Set; import java.io.*;
/** Create and manipulate Markov models and model matchers for lists of training data * a test data String and generate output from it for convenient display. * * @author * @version * */ public class MatcherController {
/** list of training data string used to generate markov models */ ArrayList
/** Generate models for analysis * @param k order of the markov models to be used * @param testData String to check against different models * @throw unchecked exceptions if the input order or data inputs are invalid */ public MatcherController(int k, ArrayList
/** @return a string containing all lines from a file * ff file contents can be got, otherwise null * This method should process any exceptions that arise. */ private static String getFileContents(String filename) { //TODO return null; }
/** * @return the ModelMatcher object that has the highest average loglikelihood * (where all candidates are trained for the same test string */ public ModelMatcher getBestMatch(ArrayList
/** @return String an *explanation* of * why the test string is the match from the candidate models */ public String explainBestMatch(ModelMatcher best) { //TODO return null; }
/** Display an error to the user in a manner appropriate * for the interface being used. * * @param message */ public void displayError(String message) { // LEAVE THIS METHOD EMPTY }
}
Task 3- Matching test strings to a model (ModelMatcher) models better matches a test string. Given two For this task, you will need to complete the Mod class, which determines which of two Markov Markov models built from text samples taken from two different sources, we can use them to estimate which source it is more likely a test s was drawn from. Even using only zero-th order models constructed using English and Russian text, we should be able to tell, for instance, that the string bopLLI oe3EKyceH is more likely to be Russian than English. We will computer a measure of fit of test strings against models, called the likelihood of a sequence under the model. The likelihood of a sequence under a k-th order Markov model is calculated as follows For each symbol c in the sequence, compute the probability of observing c under the model, given its k-letter context p (assuming the sequence to wrap around", as described above) using the Laplace-smoothed estimate of probability we used for the MarkovModel class Compute the likelihood of the entire sequence as the product of the likelihoods of each character. mathematical notation: In Let s be an input sequence of length n, and let M be a k-th order Markov model. In order to calculate the likelihood of the sequence s under the model M, for each symbol ci in s (where 1 S is n), let pi be the k-length context of the symbol ci uming wraparound). The likelihood of the sequence s under the model is II laplace(ci) i 1 where laplace is the Laplace-smoothed probability of ci occurring ven its context) as described in the previous task. The probability we obtain from this calculation may be very smal in fact, potentially so small as to be indistinguishable from zero when using Java's built-in floating-point arithmetic. Therefore we will calculate and express the likelihood using log probabilities. which do not suffer from this problem. (A weakness of log probabilities is that they cannot straightforwardly represent probabilities of zero, but our use of Laplace smoothing allows us to avoid this problem. The product of two probabilities p and g can be calculated by adding their log probabilities, logpand log g By way of example, suppose we have constructed a 2nd-order Markov model using the input string "aabcabaacaac", as described in the example for Task 2. If we are then given a test string "aabbcaac we can compute its log likelihood as follows For each character in the test string, obtain its length-2 context (assuming wraparound). Note that the tri-gram "caa" occurs twice in the (wrapped) test string Context Character Frequency bb bc Ca
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
