Question: Design and implement functions that build an order K Markov model from a piece of input text. Markov models are popular tools in speech recognition,

Design and implement functions that build an order K Markov model from a piece of input text. Markov models are popular tools in speech recognition, handwriting recognition, information retrieval, and data compression. For this assignment, you will consider a different application: generating stylized pseudo-random text.

Markov models. An order 0 Markov model predicts that each character in the alphabet occurs with a fixed probability, independent of previous characters. Given an input text, you compute the Markov model of order 0 by counting up the number of occurrences of each letter in the input, and use these as the frequencies. For example, if the input text is "agggcagcgggcg", then the order 0 Markov model predicts that each character is a with probability 2/13, c with probability 3/13, and g with probability 8/13.

Characters in English text are not independent. If the previous 5 characters in a string are orith, then the next letter is almost surely m. An order K Markov model predicts that each character occurs with a fixed probability, but that probability can depend on the previous k characters. Given an input text, you compute a Markov model of order K by counting up the number of occurrences of each letter that follows each sequence of K letters. For example, if the text has 100 occurrences of th, with 50 occurrences of the, 25 occurrences of thi, 20 occurrences of tha, and 5 occurrences of tho, the order 2 Markov model predicts that the next character following th is e with probability 1/2, i with probability 1/4, a with probability 1/5, and o with probability 1/20.

Part 0: Markov model. Create a data type Markov to represent a K-character substring. Ultimately, it will have a method random that returns a random character according to the Markov model. For now, just make it store the substring and an integer that counts the number of times the substring appears. You will need a constructor, a method to increment the frequency count, and the usual toString method for output.

public Markov(String substring) public void add() public String toString()

Part 1: frequency counts. Implement a client program FrequencyCounter.java that reads the order parameter K of the Markov model from the command-line, a text string from standard input, and uses a symbol table to insert each K-character substring (key) from the text. For example, if K is 2 and the input string is "agggcagcgggcg", then your client program should create Markov objects for each of the 5 distinct keys, and call the add method 12 times in total: ag gg gg gc ca ag gc cg gg gg gc cg. Maintain an integer count of the number of occurrences of each key. Use your symbol table's methods to print out the number of distinct keys and the number of times each key appears in the text. For the example above, your program should output:

5 distinct keys 2 ag 1 ca 2 cg 3 gc 4 gg

Part 2: language model. To generate random text, given a K-character key, you data type Markov must know all of the letters that follow the K-character key. This operation is at the crux of the matter, as you will need it to generate random characters in accordance with the Markov model. Modify your Markov data type so that in addition to the frequency counts, it records the breakdown depending on the next letter. Create your own linked list to keep track of the list of suffix characters along with their frequencies. Modify the toString method so that it it prints out the list of suffixes, along with the substring and the frequency count. Include the following method to insert a suffix character.

public void add(char c)

Modify FrequencyCounter.java to create a program LanguageModeler.java that inserts keys into the symbol table (if necessary), and calls add(char c) to add the appropriate suffix characters to the Markov model. It should produce the following output on the example input.

5 distinct keys 2 ag: 1 c 1 g 1 ca: 1 g 1 cg: 1 g 3 gc: 1 a 2 g 4 gg: 2 c 2 g

To make things look pretty, convert all whitespace characters into spaces or underscores. Note that since the last cg substring doesn't have a "next" character, we don't include it in the model.

Part 3: pseudo-random text generation. Add a method random to Markov that and returns a pseudo-random character according to the language model. Be sure to get the probabilities right, as we will be checking this.

Now, write a program TextGenerator.java that takes a command line input K, a command line input M, reads in a text string from standard input, and prints out M characters according to the order K Markov model. Start by printing the first K characters of the original text. Then, repeatedly generate successive pseudo-random characters. Using the example above, if the Markov object m represents the substring "gg", then m.random() should return c or g, each with probability 1/2. After you generate a character, move over one character position, always using the last K characters generated to determine the probabilities for the next. For example, if your program chooses c in the example above, then the next Markov object would represent the substring "gc", and according to the Markov model, the next character should be a with probability 1/3 and g with probability 2/3. Continue the process until you have output M characters. If the language model contains less than 100 K-tuples (prefixes), then print the language model (as in the program LanguageModel.java) before you output the M randomly generated characters.

Deliverables. Submit your code as usual, along with one or two of the most amusing language-modeling examples that you come up with. Describe your data structure and algorithms in detail, and analyze the most important performance characteristics of your program in the readme.txt file.

Amusement. Once you get the program working, you will find that the random text with low-order models starts to sound more and more like the original text as you increase the order, as illustrated in the examples below. There are many more apparent pearls of wisdom in the full texts on the web. As you can see, there are limitless opportunities for amusement here. Try your model on some of your own text, or find something interesting on the net. You may wish to write small filters to clean up white space for pure text, but be careful, as the special characters sometimes drive the model, as illustrated in the Shakespeare example.

Here is the Markov and LanguageModel classes implementation of the task with an explanation I got from another post that had the same question https://www.chegg.com/homework-help/questions-and-answers/please-provide-full-java-source-code-task-along-step-step-explanation-q107814705, but the TextGenerator.java is missing and I would like to know how would TextGenerator class look like:

The incomplete answer:

Markov Data Type: The Markov data type will store the K-character key, frequency counts, and the list of suffix characters along with their frequencies. We will use a linked list to store the suffix characters as it allows us to store dynamic data structures.

Java code:

import java.util.HashMap;

import java.util.Random;

public class Markov {

private String prefix;

private HashMap suffixes;

private Random random;

public Markov(String prefix) {

this.prefix = prefix;

this.suffixes = new HashMap<>();

this.random = new Random();

}

public void addSuffix(char suffix) {

int count = suffixes.getOrDefault(suffix, 0);

suffixes.put(suffix, count + 1);

}

public char random() {

int total = suffixes.values().stream().mapToInt(Integer::intValue).sum();

int r = random.nextInt(total);

int sum = 0;

for (char c : suffixes.keySet()) {

sum += suffixes.get(c);

if (sum > r) {

return c;

}

return suffixes.keySet().iterator().next();

}

The Markov class is used to store the prefix and its corresponding suffixes in a HashMap. The addSuffix method adds the suffix to the HashMap. The random method returns a pseudo-random character according to the language model.

It calculates the total number of suffixes and generates a random integer in the range [0, total). Then it iterates through the suffixes and adds their count to sum. When sum becomes greater than the randomly generated integer, the corresponding suffix character is returned.

2. LanguageModel.java

import java.util.HashMap;

public class LanguageModel {

private int order;

private HashMap model;

public LanguageModel(int order) {

this.order = order;

this.model = new HashMap<>();

}

public void addText(String text) {

for (int i = 0; i < text.length() - order; i++) {

String prefix = text.substring(i, i + order);

char suffix = text.charAt(i + order);

Markov markov = model.getOrDefault(prefix, new Markov(prefix));

markov.addSuffix(suffix);

model.put(prefix, markov);

}

public void printModel() {

if (model.size() < 100 * order) {

for (String prefix : model.keySet()) {

System.out.println(prefix + ": " + model.get(prefix).suffixes);

}

The LanguageModel class is used to store the Markov objects in a HashMap. The addText method takes a text string as input and adds the prefix-suffix pairs to the model. The printModel method prints the language model if the number of prefixes is less than 100 * order.

3. TextGenerator.java:

import java.util.Scanner;

public class TextGenerator {

public static void main(String[] args) {

Scanner sc = new

the rest of the class is missing...

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Please provide full Java source code for the task along with step-by-step explanation. Using Java Design and implement functions that build an order K Markov model from a piece of input text. Markov...

io (a) Give the general formula for estimating transition probabilities from training data. Provide the full transition matrix A for this HMM based on the training data shown. [6 marks] (b) Give the...

Tasks The goal of the project is to complete the code for the NgramAnalyser, MarkovModel, ModelMatcher and MatcherController classes, as detailed below, and to add test code to a new JUnit test...

Who is chief knowledge officer? What the primary role? A senior executive in an organization responsible for ensuring that firm fully utilizes the value it gets through knowledge- which is the most...

subject: Differential Equations pls read instructions do not use ai. drop all references and link Instructions ODE application. - find an article related to ODE application - provide a short...

Assistive technology enables dreams. Mathew Lee (personal communication) Assistive technology (AT) provides powerful tools used to diminish disability, enable activities of daily living (ADLs), and...

Python and most Python libraries are free to download or use, though many users use Python through a paid service. Paid services help IT organizations manage the risks associated with the use of...

Portray in words what transforms you would have to make to your execution to some degree (a) to accomplish this and remark on the benefits and detriments of this thought.You are approached to compose...

This paper should include 3-5 pages of content with an additional cover and reference page. This is a total of 5-7 pages. Please be aware that a properly formatted page will include approximately 350...

tudy of an innovative method based on complementarity between ARIZ, lean management and discrete event simulation for solving warehousing problems Fatima Zahra Ben Moussa a, , Roland De Guiob ,...

Oriole Engineering Corporation purchased conveyor equipment with a list price of $54,600. Three independent cases that are related to the equipment follow. Assume that the equipment purchases are...

How mutations contribute to adaptation and evolution.

change my answer like a senior accountant who wants to experience working in a big firm or big international company: Reflecting on my future in the accounting profession, the insights gained from...

CT Corp Comprehensive Question Canadian Tire Corporation, Limited ( Canadian Tire ) is a family of companies that includes a retail segment and a financial services division, among others. The retail...

Confusion surrounding compensation and benefits, with a lack of information and benchmarking. Comparative data for benchmarking needs to be collected and disseminated.

1 What are the main problem areas in the GermanCzech joint venture of VolkswagenSkoda described above, and how can they be accounted for by reference to differences in culture? What part is played by...

A disregard for health and safety management. Managers can be encouraged to take a proactive approach to these issues, and a holistic approach that ties in these issues with the companys...