Question: Predictive models of text: performing text analysis > import java.util.Set; /** * Construct a Markov model of order /k/ based on an input string. *
Predictive models of text: performing text analysis



>
import java.util.Set; /** * Construct a Markov model of order /k/ based on an input string. * * @author * @version */ public class MarkovModel {
/** Markov model order parameter */ int k; /** ngram model of order k */ NgramAnalyser ngram; /** ngram model of order k+1 */ NgramAnalyser n1gram;
/** * Construct an order-k Markov model from string s * @param k int order of the Markov model * @param s String input to be modelled */ public MarkovModel(int k, String s) { //TODO replace this line with your code }
/** * @return order of this Markov model */ public int getK() { return k; }
/** Estimate the probability of a sequence appearing in the text * using simple estimate of freq seq / frequency front(seq). * @param sequence String of length k+1 * @return double probability of the last letter occuring in the * context of the first ones or 0 if front(seq) does not occur. */ public double simpleEstimate(String sequence) { //TODO replace this line with your code return -1.0;
} /** * Calculate the Laplacian probability of string obs given this Markov model * @input sequence String of length k+1 */ public double laplaceEstimate(String sequence) { //TODO replace this line with your code return -1.0; }
/** * @return String representing this Markov model */ public String toString() { //TODO replace this line with your code return null; }
}
--------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------


>
import java.util.HashMap; import java.util.Collection; import java.util.ArrayList; import java.util.Arrays;
/** * Report the average log likelihood of a test String occuring in a * given Markov model and detail the calculated values behind this statistic. * * @author * @version */ public class ModelMatcher {
/** log likelihoods for a teststring under a given model */ private HashMap
/** Helper method that calculates the average log likelihood statistic * given a HashMap of strings and their Laplace probabilities * and the total number of ngrams in the model. * * @param logs map of ngram strings and their log likelihood * @param ngramCount int number of ngrams in the original test string * @return average log likelihood: the total of loglikelihoods * divided by the ngramCount */ private double averageLogLikelihood(HashMap
/** * @return the average log likelihood statistic */ public double getAverageLogLikelihood() { return averageLogLikelihood; } /** * @return the log likelihood value for a given ngram from the input string */ public double getLogLikelihood(String ngram) { return (logLikelihoodMap.get(ngram)); } /** * Make a String summarising the log likelihood map and its statistics * @return String of ngrams and their loglikeihood differences between the models * The likelihood table should be ordered from highest to lowest likelihood */ public String toString() { //TODO return null; }
}
--------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------

>
import java.io.File; import java.util.ArrayList; import java.util.HashMap; import java.util.Set; import java.io.*;
/** Create and manipulate Markov models and model matchers for lists of training data * a test data String and generate output from it for convenient display. * * @author * @version * */ public class MatcherController {
/** list of training data string used to generate markov models */ ArrayList
/** Generate models for analysis * @param k order of the markov models to be used * @param testData String to check against different models * @throw unchecked exceptions if the input order or data inputs are invalid */ public MatcherController(int k, ArrayList
/** @return a string containing all lines from a file * ff file contents can be got, otherwise null * This method should process any exceptions that arise. */ private static String getFileContents(String filename) { //TODO return null; }
/** * @return the ModelMatcher object that has the highest average loglikelihood * (where all candidates are trained for the same test string */ public ModelMatcher getBestMatch(ArrayList
/** @return String an *explanation* of * why the test string is the match from the candidate models */ public String explainBestMatch(ModelMatcher best) { //TODO return null; }
/** Display an error to the user in a manner appropriate * for the interface being used. * * @param message */ public void displayError(String message) { // LEAVE THIS METHOD EMPTY }
}
--------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------
>
import static org.junit.Assert.*; import org.junit.After; import org.junit.Before; import org.junit.Test;
/** * The test class ProjectTest for student test cases. * Add all new test cases to this task. * * @author * @version */ public class ProjectTest { /** * Default constructor for test class ProjectTest */ public ProjectTest() { }
/** * Sets up the test fixture. * * Called before every test case method. */ @Before public void setUp() { }
/** * Tears down the test fixture. * * Called after every test case method. */ @After public void tearDown() { } //TODO add new test cases from here include brief documentation @Test(timeout=1000) public void testLaplaceExample() { assertEquals(0,1); //TODO replace with test code } @Test(timeout=1000) public void testSimpleExample() { assertEquals(0,1); //TODO replace with test code }
@Test public void testTask3example() { MarkovModel model = new MarkovModel(2,"aabcabaacaac"); ModelMatcher match = new ModelMatcher(model,"aabbcaac"); assertEquals(0,1); //TODO replace with test code } }
Task 2 -Building a Markov model of a text sample MarkovModel) For this task, you will need to complete the MarkovModel class, which generates a Markov model from an input string, and also write a JUnit test for your model. Markov models are probabilistic models (i.e., they model the chances of particular events occurring) and are used for a broad range of natural language processing tasks (including computer speech recognition). They are widely used to model all sorts of dynamical processes in engineering, mathematics, finance and many other areas. They can be used to estimate the probability that a symbol will appear in a source of text, given the symbols that have preceded it A zero-th order Markov model of a text-source is one that estimates the probability that the next character in a sequence is, say, an "a", based simply on how frequently it occurs in a sample. Higher-order Markov models generalize on this idea. Based on a sample text, they estimate the likelihood that a particular symbol will occur in a sequence of symbols drawn from a source of text, where the probability of each symbol occurring can depend upon preceding symbols. In a first order Markov model, the probability of a symbol occurring depends only on the previous symbol. Thus, for English text, the probability of encountering a "u" can depend on whether the previous letter was a "q If it was indeed a "q then the probability of encountering a "u" should be quite high. For a second order Markov model, the probability of encountering a particular symbol can depend on the previous two symbols. and generally, the probabilities used by a t-th order Markov model can depend on the preceding k symbols. A Markov model can be used to estimate the probability of a symbol appearing, given its k predecessors, in a simple way, as follows For each context of characters of length k we estimate the probability ofthat context being followed by each letter c in our alphabet as the number of times the context appears followed by c, divided by the number of times the context appear in total. As with our NgramAnalyser class, we consider our input string to "wrap round" when analysing contexts near its end. Call this way of estimating probabilities simple estimation For instance, consider the string "aabcabaacaac". The 2 and 3-grams in it (assuming wrap-around are as follows 2-gram frequency gram frequency aab aa ab aaC aba. aC ba abc aca baa Ca boca. 2-gram frequencies Caa Cab 3-gram frequencies Given the context "aa", we can simply estimate the probability that the next character is "b" as (number of occurrences of "aab") P(next character is a "b" if last 2 were kaa" number of occurrences of "aa 3
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
