Question: Design and implement functions that build an order K Markov model from a piece of input text. Markov models are popular tools in speech recognition,
Design and implement functions that build an order K Markov model from a piece of input text. Markov models are popular tools in speech recognition, handwriting recognition, information retrieval, and data compression. For this assignment, you will consider a different application: generating stylized pseudo-random text.
Markov models. An order 0 Markov model predicts that each character in the alphabet occurs with a fixed probability, independent of previous characters. Given an input text, you compute the Markov model of order 0 by counting up the number of occurrences of each letter in the input, and use these as the frequencies. For example, if the input text is "agggcagcgggcg", then the order 0 Markov model predicts that each character is a with probability 2/13, c with probability 3/13, and g with probability 8/13.
Characters in English text are not independent. If the previous 5 characters in a string are orith, then the next letter is almost surely m. An order K Markov model predicts that each character occurs with a fixed probability, but that probability can depend on the previous k characters. Given an input text, you compute a Markov model of order K by counting up the number of occurrences of each letter that follows each sequence of K letters. For example, if the text has 100 occurrences of th, with 50 occurrences of the, 25 occurrences of thi, 20 occurrences of tha, and 5 occurrences of tho, the order 2 Markov model predicts that the next character following th is e with probability 1/2, i with probability 1/4, a with probability 1/5, and o with probability 1/20.
Part 0: Markov model. Create a data type Markov to represent a K-character substring. Ultimately, it will have a method random that returns a random character according to the Markov model. For now, just make it store the substring and an integer that counts the number of times the substring appears. You will need a constructor, a method to increment the frequency count, and the usual toString method for output.
public Markov(String substring) public void add() public String toString()
Part 1: frequency counts. Implement a client program FrequencyCounter.java that reads the order parameter K of the Markov model from the command-line, a text string from standard input, and uses a symbol table to insert each K-character substring (key) from the text. For example, if K is 2 and the input string is "agggcagcgggcg", then your client program should create Markov objects for each of the 5 distinct keys, and call the add method 12 times in total: ag gg gg gc ca ag gc cg gg gg gc cg. Maintain an integer count of the number of occurrences of each key. Use your symbol table's methods to print out the number of distinct keys and the number of times each key appears in the text. For the example above, your program should output:
5 distinct keys 2 ag 1 ca 2 cg 3 gc 4 gg
Part 2: language model. To generate random text, given a K-character key, you data type Markov must know all of the letters that follow the K-character key. This operation is at the crux of the matter, as you will need it to generate random characters in accordance with the Markov model. Modify your Markov data type so that in addition to the frequency counts, it records the breakdown depending on the next letter. Create your own linked list to keep track of the list of suffix characters along with their frequencies. Modify the toString method so that it it prints out the list of suffixes, along with the substring and the frequency count. Include the following method to insert a suffix character.
public void add(char c)
Modify FrequencyCounter.java to create a program LanguageModeler.java that inserts keys into the symbol table (if necessary), and calls add(char c) to add the appropriate suffix characters to the Markov model. It should produce the following output on the example input.
5 distinct keys 2 ag: 1 c 1 g 1 ca: 1 g 1 cg: 1 g 3 gc: 1 a 2 g 4 gg: 2 c 2 g
To make things look pretty, convert all whitespace characters into spaces or underscores. Note that since the last cg substring doesn't have a "next" character, we don't include it in the model.
Part 3: pseudo-random text generation. Add a method random to Markov that and returns a pseudo-random character according to the language model. Be sure to get the probabilities right, as we will be checking this.
Now, write a program TextGenerator.java that takes a command line input K, a command line input M, reads in a text string from standard input, and prints out M characters according to the order K Markov model. Start by printing the first K characters of the original text. Then, repeatedly generate successive pseudo-random characters. Using the example above, if the Markov object m represents the substring "gg", then m.random() should return c or g, each with probability 1/2. After you generate a character, move over one character position, always using the last K characters generated to determine the probabilities for the next. For example, if your program chooses c in the example above, then the next Markov object would represent the substring "gc", and according to the Markov model, the next character should be a with probability 1/3 and g with probability 2/3. Continue the process until you have output M characters. If the language model contains less than 100 K-tuples (prefixes), then print the language model (as in the program LanguageModel.java) before you output the M randomly generated characters.
Deliverables. Submit your code as usual, along with one or two of the most amusing language-modeling examples that you come up with. Describe your data structure and algorithms in detail, and analyze the most important performance characteristics of your program in the readme.txt file.
Amusement. Once you get the program working, you will find that the random text with low-order models starts to sound more and more like the original text as you increase the order, as illustrated in the examples below. There are many more apparent pearls of wisdom in the full texts on the web. As you can see, there are limitless opportunities for amusement here. Try your model on some of your own text, or find something interesting on the net. You may wish to write small filters to clean up white space for pure text, but be careful, as the special characters sometimes drive the model, as illustrated in the Shakespeare example.
Here is the Markov and LanguageModel classes implementation of the task with an explanation I got from another post that had the same question https://www.chegg.com/homework-help/questions-and-answers/please-provide-full-java-source-code-task-along-step-step-explanation-q107814705, but the TextGenerator.java is missing and I would like to know how would TextGenerator class look like:
The incomplete answer:
Markov Data Type: The Markov data type will store the K-character key, frequency counts, and the list of suffix characters along with their frequencies. We will use a linked list to store the suffix characters as it allows us to store dynamic data structures.
Java code:
import java.util.HashMap;
import java.util.Random;
public class Markov {
private String prefix;
private HashMap
private Random random;
public Markov(String prefix) {
this.prefix = prefix;
this.suffixes = new HashMap<>();
this.random = new Random();
}
public void addSuffix(char suffix) {
int count = suffixes.getOrDefault(suffix, 0);
suffixes.put(suffix, count + 1);
}
public char random() {
int total = suffixes.values().stream().mapToInt(Integer::intValue).sum();
int r = random.nextInt(total);
int sum = 0;
for (char c : suffixes.keySet()) {
sum += suffixes.get(c);
if (sum > r) {
return c;
}
}
return suffixes.keySet().iterator().next();
}
}
The Markov class is used to store the prefix and its corresponding suffixes in a HashMap. The addSuffix method adds the suffix to the HashMap. The random method returns a pseudo-random character according to the language model.
It calculates the total number of suffixes and generates a random integer in the range [0, total). Then it iterates through the suffixes and adds their count to sum. When sum becomes greater than the randomly generated integer, the corresponding suffix character is returned.
2. LanguageModel.java
import java.util.HashMap;
public class LanguageModel {
private int order;
private HashMap
public LanguageModel(int order) {
this.order = order;
this.model = new HashMap<>();
}
public void addText(String text) {
for (int i = 0; i < text.length() - order; i++) {
String prefix = text.substring(i, i + order);
char suffix = text.charAt(i + order);
Markov markov = model.getOrDefault(prefix, new Markov(prefix));
markov.addSuffix(suffix);
model.put(prefix, markov);
}
}
public void printModel() {
if (model.size() < 100 * order) {
for (String prefix : model.keySet()) {
System.out.println(prefix + ": " + model.get(prefix).suffixes);
}
}
}
}
The LanguageModel class is used to store the Markov objects in a HashMap. The addText method takes a text string as input and adds the prefix-suffix pairs to the model. The printModel method prints the language model if the number of prefixes is less than 100 * order.
3. TextGenerator.java:
import java.util.Scanner;
public class TextGenerator {
public static void main(String[] args) {
Scanner sc = new
the rest of the class is missing...
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
