Question: You will build your language model from a given set of example texts. As the model is based on trigram counts, you must count how

You will build your language model from a given set of example texts. As the model is based on trigram counts, you must count how many times triples of consecutive words appear in each example text. Words should be treated case-sensitively, meaning "she" and "She" should be considered two different words. And, although the example texts may contain punctuation, you should not treat it specially. That is, if the file contains the phrase "he, she, I", then you can consider the first word as "he,", the second as "she," and the third as "I". Said another way, process your example files as if they contained no punctuation, and consider the two words "she" and "she," as two different words.

You must write a C++ program which when built, creates an executable file named hw7a that takes two command-line arguments. The first argument is the name of a text file containing a list of input filenames.

In order to treat the beginning and end of your example files meaningfully during Part B, you will include in the model you create in Part A the special words "", "" (to indicate the start of each document), and "", "" (to indicate the end of each document). In particular, suppose your example text begins with words a b and ends with words c d. Then you must add into your model the four trigrams "", , a

, a, b c, d, ""

d, "",

And you will need to add four similar trigrams for each example text that you process.

Each time your program is run, it should build your trigram-based language model by processing each text file specified in the input filename list. What happens after that will depend on the second argument specified at the command line. The second argument is a single letter, and should be one of "a", "r", or "c". Your program should output to the C++ standard output stream (cout) the language model you created, ordering entries as specified by the argument letter as follows:

a - forward alphabetical order. This means that trigrams are output in alphabetical order by the first word in each trigram, using the alphabetical order of the second and then third word in each trigram to break ties.

r - reverse alphabetical order. This means that trigrams are output in descending alphabetical order by the first word in each trigram, using the descending alphabetical order of the second and then third word in each trigram to break ties.

c - count order. The means that trigrams are output in ascending order by frequency, using forward alphabetical ordering of first words and then second and then third words to break ties.

Your output will consist of one trigram with associated count per line. On a given line, the 4 outputs (trigramWord1, trigramWord2, trigramWord3, and count) should be separated by single spaces.

Example

Suppose the list of training texts input for your program resides in a file named tiny_ex.txt, and the contents of the file are names of text files containing excerpts from Dr. Seuss books as follows (click the links to see the contents of the two text files): sl.txt

ge.txt

For the command ./hw7a tiny_ex.txt a, the expected output is:

I 1

theyve 1

I do 1

theyve talked 1

Clause. 1

I do not 2

Santa Clause. 1

a lot about 1

about flaws. theyve 1

about gauze. theyve 1

about laws and 1

about old Santa 1

about paws and 1

and theyve talked 2

anywhere 1

do not like 2

flaws. theyve talked 1

gauze. theyve talked 1

here or there 1

laws and theyve 1

like them anywhere 1

like them here 1

lot about old 1

not like them 2

old Santa Clause. 1

or there I 1

paws and theyve 1

quite a lot 1

talked about flaws. 1

talked about gauze. 1

talked about laws 1

talked about paws 1

talked quite a 1

them anywhere 1

them here or 1

there I do 1

theyve talked about 4

theyve talked quite 1

For the command ./hw7a tiny_ex.txt c, the expected output is:

I 1

theyve 1

I do 1

theyve talked 1

Clause. 1

Santa Clause. 1

a lot about 1

about flaws. theyve 1

about gauze. theyve 1

about laws and 1

about old Santa 1

about paws and 1

anywhere 1

flaws. theyve talked 1

gauze. theyve talked 1

here or there 1

laws and theyve 1

like them anywhere 1

like them here 1

lot about old 1

old Santa Clause. 1

or there I 1

paws and theyve 1

quite a lot 1

talked about flaws. 1

talked about gauze. 1

talked about laws 1

talked about paws 1

talked quite a 1

them anywhere 1

them here or 1

there I do 1

theyve talked quite 1

I do not 2

and theyve talked 2

do not like 2

not like them 2

theyve talked about 4

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Read the case study "Southwest Airlines," found in Part 2 of your textbook. Review the "Guide to Case Analysis" found on pp. CA1 - CA11 of your textbook. (This guide follows the last case in the...

Question: Explain how teachers can use translanguaging in different types of classroom contexts. Please provide examples, and Explain why? ' Prerace IM f you have chosen to read The Translanguaging...

Question: What is translanguage? Explain why? Prerace IM f you have chosen to read The Translanguaging Classroom: Leveraging Student Bilingualism for Learning, you are probably an educator-a teacher,...

PA4 Maps (100 pts) Due: Learner Objectives ----------------- At the conclusion of this programming assignment, participants should be able to: Implement hash tables and hash functions Linear probing...

*******PLEASE ANSWER IN PYTHON ONLY********* PA4 Maps (100 pts) Due: Learner Objectives ----------------- At the conclusion of this programming assignment, participants should be able to: Implement...

********PLEASE ANSWER IN PYTHON ONLY********* PA4 Maps (100 pts) Due: Learner Objectives ----------------- At the conclusion of this programming assignment, participants should be able to: Implement...

*******PLEASE ANSWER IN PYTHON ONLY********* Learner Objectives ----------------- At the conclusion of this programming assignment, participants should be able to: Implement hash tables and hash...

What are the biggest ah-ha! moments from Oracy Development? 6 English-Language Oracy Development Learning Outcomes After reading this chapter, you should be able to ... . Describe the basics of...

PYTHON QUESTION...... Overview and Requirements Natural language processing (NLP) refers to computational technique involving language. It is a broad field. For this assignment, we will learn a bit...

Use the information and construct Common size balance sheet Balance sheet as a percentage of sales Assets Cash & securities Accounts receivable Inventories Prepaid expenses Total current assets Plant...

What product (including its stereochemistry) is expected from the Hofmann elimination of each of the following stereoisomers? N(CH3)3 OH (2R,3S)- Ph CH CH Ph CHy

Seeking help with identifying the relevance and application of DoD Rainbow series on the design, implementation, deployment, and maintenance of secure information systems.

Please answer the entire question and show work so I can learn. For a non-dividend-paying stock index, the current price is 1100 and the 6-month forward price is 1150. Assume the price of the stock...

5. What are the challenges of managing IT infrastructure and management solutions?

2. What solutions are available for these problems? Are they management, organizational, or technology solutions? Explain your answer. Whats too hot to handle? It might very well be your companys...

4. What are the current trends in computer software platforms?