Question: I would really appreciate if I can get some help on this. Write a class called SourceModel with the following constructors and methods: A single

I would really appreciate if I can get some help on this.

Write a class called SourceModel with the following constructors and methods:

A single constructor with two String parameters, where the first parameter is the name of the source model and the second is the file name of the corpus file for the model. The constructor should create a letter-letter transition matrix using this recommended algorithm sketch:

Initialize a 26x26 matrix for character counts

Print Training {name} model

Read the corpus file one character at a time, converting all characters to lower case and ignoring any non-alphabetic character.

For each character, increment the corresponding (row, col) in your counts matrix. The row is the for the previous character, the col is for the current character. (You could also think of this in terms of bigrams.)

After you read the entire corpus file, youll have a matrix of counts.

From the matrix of counts, create a matrix of probabilities each row of the transition matrix is a probability distribution.

A probabilities in a distribution must sum to 1. To turn counts into probabilities, divide each count by the sum of all the counts in a row.

Print done. followed by a newline character.

A getName method with no parameters which returns the name of the SourceModel.

A toString method which returns a String representation of the model like the one shown below under Running Your Program in jshell.

A probability method which takes a String and returns a double which indicates the probability that the test string was generated by the source model, using the transition probability matrix created in the constructor. Heres a recommended algorithm:

Initialize the probability to 1.0

For each two-character sequences of characters in the test string test, cici and ci+1ci+1 for i=0i=0 to test.length()1test.length()1, multiply the probability by the entry in the transition probability matrix for the c1c1 to c2c2 transition, which should be found in row cici an column ci+1ci+1 in the matrix. (You could also think of the indices as ci1,cici1,ci for i=1i=1 to test.length()1test.length()1.)

A main method that makes SourceModel runnable from the command line. You program should take 1 or more corpus file names as command line arguments followed by a quoted string as the last argument. The program should create models for all the corpora and test the string with all the corpora. Heres an algorithm sketch:

The first n-1 arguments to the program are corpus file names to use to train models. Corpus files are of the form .corpus

The last argument to the program is a quoted string to test.

Create a SourceModel object for each corpus

Use the models to compute the probability that the test text was produced by the model

Probabilities will be very small. Normalize the probabilities of all the model predictions to a probability distribution (so they sum to 1) (closed-world assumption we only state probabilities relative to models we have).

Print results of analysis

Running Your Program

Sample runs from the command line:

$ java SourceModel *.corpus "If you got a gun up in your waist please don't shoot up the place (why?)" Training english model ... done. Training french model ... done. Training hiphop model ... done. Training lisp model ... done. Training spanish model ... done. Analyzing: If you got a gun up in your waist please don't shoot up the place (why?) Probability that test string is english: 0.00 Probability that test string is french: 0.00 Probability that test string is hiphop: 1.00 Probability that test string is lisp: 0.00 Probability that test string is spanish: 0.00 Test string is most likely hiphop. $ java SourceModel *.corpus "Ou va le monde?" Training english model ... done. Training french model ... done. Training hiphop model ... done. Training lisp model ... done. Training spanish model ... done. Analyzing: Ou va le monde? Probability that test string is english: 0.02 Probability that test string is french: 0.85 Probability that test string is hiphop: 0.01 Probability that test string is lisp: 0.10 Probability that test string is spanish: 0.01 Test string is most likely french. $ java SourceModel *.corpus "My other car is a cdr." Training english model ... done. Training french model ... done. Training hiphop model ... done. Training lisp model ... done. Training spanish model ... done. Analyzing: My other car is a cdr. Probability that test string is english: 0.39 Probability that test string is french: 0.00 Probability that test string is hiphop: 0.61 Probability that test string is lisp: 0.00 Probability that test string is spanish: 0.00 Test string is most likely hiphop. $ java SourceModel *.corpus "defun Let there be rock" Training english model ... done. Training french model ... done. Training hiphop model ... done. Training lisp model ... done. Training spanish model ... done. Analyzing: defun Let there be rock Probability that test string is english: 0.01 Probability that test string is french: 0.00 Probability that test string is hiphop: 0.42 Probability that test string is lisp: 0.57 Probability that test string is spanish: 0.00 Test string is most likely lisp.

Sample runs from jshell:

$ jshell | Welcome to JShell -- Version 10.0.2 | For an introduction type: /help intro jshell> /open SourceModel.java jshell> var french = new SourceModel("french", "french.corpus") Training french model ... done. french ==> Model: french a b c d e f ... 1.00 0.01 0.01 0.01 0.01 jshell> System.out.println(french) // implicitly calls french.toString() Model: french a b c d e f g h i j k l m n o p q r s t u v w x y z a 0.01 0.03 0.03 0.02 0.01 0.01 0.03 0.01 0.26 0.01 0.01 0.07 0.07 0.13 0.01 0.06 0.01 0.09 0.06 0.04 0.06 0.05 0.01 0.01 0.01 0.01 b 0.07 0.01 0.01 0.03 0.14 0.01 0.01 0.01 0.07 0.01 0.01 0.21 0.01 0.01 0.14 0.01 0.01 0.24 0.01 0.03 0.07 0.01 0.01 0.01 0.01 0.01 c 0.04 0.02 0.02 0.01 0.26 0.01 0.01 0.19 0.06 0.01 0.01 0.08 0.02 0.01 0.15 0.01 0.01 0.11 0.01 0.01 0.06 0.01 0.01 0.01 0.01 0.01 d 0.14 0.01 0.01 0.01 0.39 0.01 0.01 0.01 0.13 0.01 0.01 0.03 0.01 0.01 0.11 0.01 0.01 0.07 0.03 0.01 0.07 0.01 0.01 0.01 0.01 0.01 e 0.04 0.01 0.04 0.05 0.07 0.01 0.01 0.01 0.01 0.04 0.00 0.07 0.05 0.13 0.01 0.04 0.01 0.07 0.15 0.14 0.06 0.00 0.00 0.01 0.01 0.00 f 0.15 0.01 0.01 0.01 0.23 0.01 0.01 0.01 0.08 0.01 0.01 0.08 0.01 0.01 0.23 0.01 0.01 0.15 0.08 0.01 0.01 0.01 0.01 0.01 0.01 0.01 g 0.01 0.01 0.01 0.01 0.27 0.01 0.01 0.01 0.09 0.01 0.01 0.18 0.05 0.09 0.05 0.01 0.01 0.23 0.01 0.01 0.05 0.01 0.01 0.01 0.01 0.01 h 0.43 0.01 0.01 0.07 0.14 0.01 0.01 0.01 0.07 0.01 0.01 0.07 0.01 0.01 0.21 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 i 0.03 0.02 0.04 0.04 0.16 0.01 0.04 0.01 0.01 0.01 0.01 0.11 0.06 0.09 0.03 0.02 0.01 0.03 0.15 0.14 0.01 0.01 0.01 0.01 0.01 0.01 j 0.24 0.01 0.01 0.01 0.53 0.01 0.01 0.01 0.03 0.01 0.01 0.01 0.01 0.01 0.06 0.01 0.01 0.01 0.01 0.01 0.15 0.01 0.01 0.01 0.01 0.01 k 0.50 0.01 0.01 0.01 0.50 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 l 0.20 0.01 0.01 0.01 0.46 0.01 0.01 0.01 0.07 0.01 0.01 0.11 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.02 0.06 0.01 0.01 0.01 0.01 0.01 m 0.22 0.16 0.01 0.01 0.26 0.01 0.01 0.01 0.10 0.01 0.01 0.01 0.06 0.01 0.12 0.04 0.01 0.01 0.01 0.01 0.03 0.01 0.01 0.01 0.01 0.01 n 0.06 0.01 0.03 0.13 0.16 0.04 0.01 0.01 0.05 0.03 0.01 0.02 0.01 0.04 0.03 0.01 0.04 0.01 0.08 0.22 0.02 0.01 0.01 0.01 0.01 0.01 o 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.09 0.01 0.01 0.03 0.06 0.24 0.01 0.02 0.01 0.18 0.04 0.01 0.28 0.01 0.02 0.01 0.01 0.01 p 0.25 0.01 0.01 0.02 0.11 0.01 0.01 0.02 0.02 0.01 0.01 0.13 0.01 0.01 0.20 0.05 0.01 0.13 0.05 0.01 0.04 0.01 0.01 0.01 0.01 0.01 q 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 1.00 0.01 0.01 0.01 0.01 0.01 r 0.20 0.01 0.03 0.02 0.30 0.01 0.01 0.01 0.08 0.01 0.01 0.06 0.01 0.01 0.05 0.01 0.01 0.03 0.05 0.12 0.02 0.01 0.01 0.01 0.01 0.01 s 0.07 0.02 0.05 0.04 0.15 0.01 0.01 0.01 0.10 0.03 0.01 0.06 0.01 0.01 0.09 0.06 0.03 0.01 0.05 0.09 0.10 0.03 0.01 0.01 0.01 0.01 t 0.13 0.01 0.01 0.04 0.19 0.01 0.01 0.01 0.05 0.04 0.01 0.08 0.03 0.01 0.13 0.01 0.02 0.08 0.01 0.03 0.12 0.01 0.01 0.01 0.01 0.01 u 0.04 0.01 0.02 0.01 0.10 0.01 0.01 0.01 0.07 0.01 0.01 0.05 0.02 0.20 0.01 0.02 0.01 0.24 0.12 0.05 0.02 0.01 0.01 0.01 0.01 0.01 v 0.26 0.01 0.01 0.01 0.37 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.26 0.01 0.01 0.11 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 w 0.01 0.01 0.01 0.67 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.33 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 x 0.01 0.01 0.14 0.01 0.14 0.01 0.01 0.01 0.29 0.01 0.01 0.01 0.01 0.14 0.01 0.14 0.01 0.01 0.01 0.01 0.14 0.01 0.01 0.01 0.01 0.01 y 0.50 0.01 0.25 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.25 0.01 0.01 0.01 0.01 0.01 0.01 0.01 z 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 1.00 0.01 0.01 0.01 0.01 jshell> french.probability("Il y a tout ce que vous voulez aux Champs-Elysees") $8 ==> 3.966845096265183E-43

Refer to Oracles tutorial on reading a file one character at a time: https://docs.oracle.com/javase/tutorial/essential/io/charstreams.html

FileReaders read method returns int. Youll probably want to cast these to chars. Thats fine. As the documentation says, the lower 16 bits are the Unicode code point for a character.

If you use String.split to get corpus names from file names, remember that . is a special regex character. Use a character class to match a literal . character. For example "foo.fighters".split("[.]") is ["foo", "fighters"].

char is an integral type, so you can easily find a chars offset from 'a' with an expression like ch - 'a', where ch is a charvariable.

The Character class has many static utility methods you will find useful, like isAlphabetic, toLowerCase.

Corpus files:

https://drive.google.com/drive/folders/18Xa784tmnQFqz_yGeRlfjgA-u0BHCz-z?usp=sharing

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Using Java: Sample Corpus File: English: https://drive.google.com/file/d/1iYpf3brEDu8ePDFGXHuATC5YaHNrYHr9/view?usp=sharing ***** You can assume corpus files are of the form .corpus. Sample Results...

Corpus files: English: https://drive.google.com/file/d/1iYpf3brEDu8ePDFGXHuATC5YaHNrYHr9/view?usp=sharing French: https://drive.google.com/file/d/1KvCKDp8Uk1XUi_q7FLaawfvptfM5fI2F/view?usp=sharing...

For Circle Part 4, only use toString for spot2 (see below for an explanation of toString). Keep using the get and set methods for the other spot. The final output should look like this: Circle...

I need help with this assignment using Java please! Thank you. CSE 205: Object Oriented Programming Assignment 3 Overvievw In this assignment you will write a program that will sim- Node Attack Heal...

Just paste your code please! We use Java. Thx CSE 205: Object Oriented Programming Assignment 3 Overview In this assignment you will write a program that will sim- BattleEvent Node Attack Heal...

C++, I need help please!!! Write a class called Player that has four data members: a string for the player's name, and an int for each of these stats: points, rebounds and assists. The class should...

Java Assignment 3 Idea: We have all had to take quizzes, whether for fun or as a requirement (ex: this course). In this assignment you will implement a simple multiple choice quiz, using good...

Write a class calledEvaluatorwith the following static methods. public static Deque infixToPostfix(Deque in) public static BigInteger evalPostfix(Deque in) TheinfixToPostfixmethod should take an...

This program provides a simplified example for an AmusementPark with multiple Attractions and two types of riders, normal and fast that measures wait times. In order to better reflect Object-Oriented...

Find the composition of a saturated solution of AgCN containing 0.10 M KCN adjusted to pH 12.00 with NaOH. Consider the following equilibria and use Davies activity coefficients. Suggested...

(7) Complete the following outlined reactions by providing the major organic product expected in cach case. (For multistep processes, give the final product.) T. r. Na Na 1)2 L HCKconc) 2) Cuc eat...

Commercial paper maturities may range up to 9 0 days. 1 year. 9 years. 2 7 0 days.

CT Corp Comprehensive Question Canadian Tire Corporation, Limited ( Canadian Tire ) is a family of companies that includes a retail segment and a financial services division, among others. The retail...

6-7 Why is big data so interesting to businesses? What challenges does big data present?

6-10 In this exercise, you will use database software to design a database for managing inventory for a small business. Sylvesters Bike Shop, located in San Francisco, California, sells road,...

6-12 In MyMISLab, you will find a Collaboration and Teamwork Project dealing with the concepts in this chapter. You will be able to use Google Drive, Google Docs, Google Sites, Google+, or other open...