Question: C++ Machine learning has become increasingly useful in Computer Science. To state things perhaps too simply, in machine learning, a training set is used to

C++ Machine learning has become increasingly useful in Computer Science. To state things perhaps too simply, in machine learning, a training set is used to form a model of how to respond to different kinds of input. In this lab, you will write a very simple machine learning algorithm using a map. You'll use a map to predict which word should follow a word or set of words. This model can then be used to have a computer automatically write stuff, such as a fake scripture, a political speech, or a poem. Details This lab is more of a tutorial than a you-figure-out-the-code lab. Well walk you through each step. To ensure that you benefit from this style of lab, make sure you understand the code you write. The rest of the document walks you through each of the steps. Part 1 Write a program that takes as input a command-line argument that specifies the name of a text file. If your program name is Lab3 and the name of the text file you are reading in is 1Nephi.txt Download 1Nephi.txt (you should download this file), your program should be able to run from the command-line as follows: ./Lab3 1Nephi.txt Your lab code should open and read the input file and store each word of the file into a set (remove all punctuation you may find the isalpha function to be helpful for this). You may find this repository Links to an external site. to be helpful. Output: Print the contents of the set to a new file with each string in the set appearing on a new line. The file 1Nephi.txt provides a good input file for you to test your code on. Unless Ive done this incorrectly (which is possible), your output file should have close to 1791 words in it (your word count may vary somewhat depending on how you deal with hyphens). The name of the file should be [filename]_set.txt, where [filename] is the command-line input supplied by the user. Part 2 Add to your program a function that reads the file (specified from the command-line) and stores each word of the file into a vector. (remove all punctuation you may find the isalpha function to be helpful for this). Output: Print the contents of the vector to a new file with each string in the vector appearing on a new line. The file 1Nephi.txt provides a good input file for you to test your code on. Unless Ive done this incorrectly (which is possible), your output file will have close to 25,106 words in it (your word count may vary somewhat depending on how you deal with hyphens). The name of the file should be [filename]_vector.txt, where [filename] is the command-line input supplied by the user. Insight: You should know why there are more words in the vector than in the set. If you dont know, find out why. Part 3 Next, lets do something with your vector of strings. Youll create a map of strings to string. To do this, for each string in your vector of strings (except the last string), create an entry in your map that has the string as the key, and the next word in the vector as the value. For example, for the phrase having been born of goodly parents, you would add the entries (key=having, value=been), (key=been, value=born), (key=born, value=of), etc. Here is a snippet of code that does this (make sure you understand it): map wordmap; string last=""; for (list::iterator it=lst.begin(); it!=lst.end(); it++) { wordmap[last]=*it; last = *it; } Note that lst is the list of strings created in part 2 (I used a list instead of a vector to give you more experience with lists! Notice that our first entry that we put into the variable wordmap has an empty string as the key, and the first word as the value. Output: Print out the map to a file named the file [filename]_map.txt. So, if the second command-line argument specifying the input file is 1Nephi.txt, your output file should be called 1Nephi.txt_map.txt. On each line of the file, output the key a comma and space , , and then the value. Part 4 Next, lets do something with your map. Your map has learned the contents for each word in the document. We want to generate new text using this context. You can generate 100 words of text as follows (make sure you understand how this works): string state = ""; for(int i = 0; i < 100; i++){ cout << wordmap[state] << " "; state = wordmap[state]; } cout << endl; If you run this code using 1Nephi.txt as input, you should get something like: I and endure to the last day And thus it is Amen And thus it is Amen And thus it is Amen And thus it is Amen And thus it is Amen And thus it is Amen And thus it is Amen And thus it is Amen And thus it is Amen And thus it is Amen And thus it is Amen And thus it is Amen And thus it is Amen And thus it is Amen And thus it is Amen And thus it is Amen And thus it is Amen And thus it is Amen And thus it Output: Print the sermon/poem/story/speech you generated to the terminal. Part 5 That isnt very cool, because we get stuck in a loop since we are only keeping track of the last next word weve seen for every word. We can do better. Lets keep track of all of the words that are seen after a word. To do this, lets change the map to be a map with strings as keys and vectors of strings as values. Thus, well put each consecutive pair of words in the document into our map as we did before. But this time, well keep track of each word that comes after a particular word. We can to this like this (make sure you understand how this works): map> wordmap; string state = ""; for(list::iterator it=lst.begin(); it !=lst.end(); it++) { wordmap[state].push_back(*it); state = *it; } Note that lst is the list of strings created in part 2 (I used a list instead of a vector just to give you experience with different data structures). When I run this code, I get the following vector of strings associated with the Key Nephi: having, do, will, being, because, returned, said, and, crept, had, being, do, and, did, and, being, had, and, and, do, wherefore, after, Nevertheless, proceed, having, was, because, what, saw, also, beheld, beheld, beheld, beheld, saying, beheld, heard, am, had, was, spake, did, had, did, took, had, went, having, did, did, beheld, did, who, had, did, did, was, spake, said, said, did, did, began, began, did, received, did, have, did, had, said, declare, say, make You can check to see whether you get the same list (to do that, youll need to iterate over the keys in your map until you find Nephi, and then you can print out all the words in the vector (Value) associated with the Key Nephi. Output: To verify that you have created this correctly, print out the vector of words that correspond to the 6th entry in the map (accessed using an iterator). Now that you have created this map, you can generate a better sermon (than before) by generating words using this map. We do this by randomly picking a string from the vector of strings associated with the key (which is the last word spoken). I generated a 100-word sermon with this code: srand(time(NULL)); // this line initializes the random number generated // so you dont get the same thing every time state = ""; for (int i = 0; i < 100; i++) { int ind = rand() % wordmap[state].size(); cout << wordmap[state][ind] << " "; state = wordmap[state][ind]; } cout << endl; The generated text now sounds a lot more readable and doesn't get stuck in an infinite loop. Heres sample output. I was desirous to their hearts insomuch that I might not occupy these things he thinketh that they smite two churches for me saying Hosanna to stir them saying in an account of singing O man like unto him or of his life of my father saw and kingdoms And it proceeded forth my people who need not hunger nor touch me a man descending out from the Gentiles who were driven him out of Laban And now my mother of the land unto their abominations and Joseph And it came to pass that I saw in the nations kindreds Output: Print the sermon/poem/story/speech you generated to the terminal. Part 6 But we are still only using one word to get context and the generated text still doesn't sound very good. We would really like to use a phrase as a key so that we can learn multiple word context. In other words, for the phrase: I Nephi having, we would like to have I Nephi be the key (two words) for having. To do this, we just change the key for the map to be a list of strings. As we move through the text, we can push words onto the back of the list and pop words off the front of the list to continually get a context of M words. Heres the code: map, vector> wordmap; list state; for (int i = 0; i < M; i++) { state.push_back(""); } for (list::iterator it=lst.begin(); it!=lst.end(); it++) { wordmap[state].push_back(*it); state.push_back(*it); state.pop_front(); } Note that lst is the list of strings created in part 2 (I used a list instead of a vector to give you more experience with lists! We can then generate a new sermon from the resulting map with this code: state.clear(); for (int i = 0; i < M; i++) { state.push_back(""); } for (int i = 0; i < 100; i++) { int ind = rand() % wordmap[state].size(); cout << wordmap[state][ind]<<" "; state.push_back(wordmap[state][ind]); state.pop_front(); } The texts that is generated now sound much more like English (though admittedly, it aint perfect). Heres a sample 100-word sermon that I generated for M=2: I Nephi did make a full account of my brethren who were scattered upon all the words of Isaiah who spake concerning the restoration of the kings and the beauty thereof was exceedingly glad for she truly had mourned because of my father and also against God nevertheless ye know that I did make tools of the Lord therefore let us slay our father Abraham saying In they seed shall be led away by the power of God was the word of the olivetree or the remnants of the devil which shall war among themselves and the wars and rumors Note that, for some inputs, the above code might produce a floating point exception if somehow wordmap[state].size() is equal to zero. You could fix this by identifying when this is the case, and then resetting the state to its initial value (obtained in lines 2-3 of the previous snippet of code). Output: Print the sermon/poem/story/speech you generated to the terminal. Part 7 Do something creative to try to improve your algorithm, or experiment with different texts. Trump.txt Download Trump.txt and Nephi_Trump.txt Download Nephi_Trump.txt are provided (note: these text files may have issues if you use windows), but you could find cooler ones. For example, you could collect a bunch of poems by Robert Frost into a .txt file, and then run your algorithm to see what kinds of poems you could generate. Try to find something that will impress the other students and TAs who will be grading your lab. This screencast tries to explain some of the new syntax for loops and iterators and gets you to the point where you create text that sounds kind of like Nephi Grading For this lab, you will submit a description of what you did in your code (one paragraph) along with the best output you can generate for an author of your choice. Pick a text from a famous author and see if you can generate text that sounds like that author. Use your best algorithm and best data set to wow the other students. If some of you are struggling to get things working, this video should help, then get creative and try some new approaches to generating text that sounds like an author.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!