Question: In C++ Please fix my code or create new code where it would get the expected output (shown below) Test Expected Got ./program 25 mobydick.txt

In C++ Please fix my code or create new code where it would get the expected output (shown below)

Test Expected Got
./program 25 mobydick.txt ignoreWords.txt
Array doubled: 8 Distinct non-common words: 13744 Total non-common words: 67327 Probability of next 10 words from rank 25 --------------------------------------- 0.00302 - other 0.00300 - over 0.00297 - been 0.00296 - these 0.00290 - sea 0.00285 - said 0.00282 - down 0.00276 - yet 0.00275 - any 0.00270 - whales
Array doubled: 8 Distinct non-common words: 13745 Total non-common words: 71939 Probability of next 10 words from rank 25 --------------------------------------- 0.00285 mobydick 0.00282 other 0.00281 over 0.00278 been 0.00277 these 0.00271 sea 0.00267 said 0.00264 down 0.00259 yet 0.00257 any
./program 5 mobydick.txt ignoreWords.txt
Array doubled: 8 Distinct non-common words: 13744 Total non-common words: 67327 Probability of next 10 words from rank 5 --------------------------------------- 0.00515 - were 0.00480 - some 0.00460 - now 0.00452 - no 0.00449 - then 0.00443 - upon 0.00437 - like 0.00437 - when 0.00428 - ye 0.00419 - are
Array doubled: 8 Distinct non-common words: 13745 Total non-common words: 71939 Probability of next 10 words from rank 5 --------------------------------------- 0.00566 had 0.00482 were 0.00449 some 0.00431 now 0.00423 no 0.00420 then 0.00414 upon 0.00409 like 0.00409 when 0.00400 ye
./program 10 mobydick.txt ignoreWords.txt
Array doubled: 8 Distinct non-common words: 13744 Total non-common words: 67327 Probability of next 10 words from rank 10 --------------------------------------- 0.00443 - upon 0.00437 - like 0.00437 - when 0.00428 - ye 0.00419 - are 0.00407 - more 0.00364 - them 0.00359 - man 0.00345 - into 0.00325 - though
Array doubled: 8 Distinct non-common words: 13745 Total non-common words: 71939 Probability of next 10 words from rank 10 --------------------------------------- 0.00420 then 0.00414 upon 0.00409 like 0.00409 when 0.00400 ye 0.00392 are 0.00381 more 0.00341 them 0.00336 man 0.00322 into
./program 15 HarryPotter.txt ignoreWords.txt
Array doubled: 6 Distinct non-common words: 5985 Total non-common words: 50331 Probability of next 10 words from rank 15 --------------------------------------- 0.00393 - got 0.00389 - no 0.00387 - could 0.00387 - didnt 0.00385 - like 0.00374 - know 0.00358 - down 0.00358 - just 0.00358 - professor 0.00358 - see
Array doubled: 6 Distinct non-common words: 5985 Total non-common words: 50331 Probability of next 10 words from rank 15 --------------------------------------- 0.00393 got 0.00389 no 0.00387 didnt 0.00387 could 0.00385 like 0.00374 know 0.00358 professor 0.00358 just 0.00358 see 0.00358 down

Code

 In C++ Please fix my code or create new code whereit would get the expected output (shown below) Test Expected Got ./program25 mobydick.txt ignoreWords.txt Array doubled: 8 Distinct non-common words: 13744 Total non-commonwords: 67327 Probability of next 10 words from rank 25 --------------------------------------- 0.00302- other 0.00300 - over 0.00297 - been 0.00296 - these 0.00290#include #include #include #include #include using namespace std;

struct wordRecord { string word; int count; };

//Function prototypes void getIgnoreWords(const char* ignoreWordFileName, string ignoreWords[]); bool isIgnoreWord(string word, string ignoreWords[]); int getTotalNumberNonIgnoreWords(wordRecord distinctWords[], int length); void sortArray(wordRecord distinctWords[], int length); void printTenFromN(wordRecord distinctWords[], int N, int totalNumWords);

int main(int argc,char* argv[]) { //Command line argument error check if (argc != 4) { cout " > ignoreWords[i]) { i++; } in.close(); } //Function return true if the word found in array else false bool isIgnoreWord(string word, string ignoreWords[]) { for (int i = 0; i

Instructions In this assignment, we will write a program to analyze the word frequency of a document. As the number of words in the document may not be known a priori, we will implement a dynamically doubling array to store the information. Please read all the directions before writing code, as this write-up contains specific requirements for how the code must be written. Problem Overview: There are two files on Canvas. mobydick.txt - contains text to be read and analyzed by your program. The file contains the full text from Moby Dick. For your convenience, all the punctuation has been removed, words have been converted to lowercase, and the entire document can be read as if it were written on a single line. ignoreWords.txt contains the 50 of the most common words in the English language, which your program will ignore during analysis. Your program must take three command line arguments in the following order - a number N, the file name of the text to be read, and the file name containing the words to be ignored. It will read the text from the first file while ignoring the words from the second file and store all the unique words encountered in a dynamically doubling array. After necessary calculation, the program must print the following information: O . . The number of times array doubling was required to store all the unique words The number of unique non-ignore" words in the file The total word count of the file (excluding the ignore words) After calculating the probability of occurrence of each word and storing it in an array in the decreasing order of probability, starting from index N of the array, print the 10 most frequent words along with their probability (up to 5 decimal places) . For example, running your program with the command: ./Assignment2 25 mobydick.txt ignorewords.txt would print the next 10 words starting from index 25, i.e. your program must print the 25th-34th most frequent words, along with their respective probabilities. Keep in mind that these words must not be any of the words from ignoreWords.txt. The full results would be: Array doubled: 8 Distinct non-common words: 13744 Total non-common words: 67327 Probability of next 10 words from rank 25 0.00302 - other 0.00300 - over 0.00297 - been 0.00296 - these 0.00290 - sea 0.00285 - said 0.00282 - down 0.00276 - yet 0.00275 - any 0.00270 - whales Specifications: 1. Use an array of structs to store the words and their counts You will store each unique word and its count (the number of times it occurs in the document) in an array of structs. As the number of unique words is not known ahead of time, the array of structs must be dynamically sized. The struct must be defined as follows: struct wordRecord { string word; int count; }; 2. Use the array-doubling algorithm to increase the size of your array Your array will need to grow to fit the number of words in the file. Start with an array size of 100, and double the size whenever the array runs out of free space. You will need to allocate your array dynamically and copy values from the old array to the new array. (Array-doubling algorithm must be implemented in the main() function). Note: Don't use the built-in std::vector class. This will result in a loss of points. You're actually writing the code that the built-in vector uses behind-the-scenes! 3. Ignore the top 50 most common words that are read from the ignoreWords.txt file To get useful information about word frequency, we will be ignoring the 50 most common words in the English language as noted in ignoreWords.txt 4. Take three command line arguments Your program must take three command line arguments 1. a number N which tells your program the starting index to print the next 10 most frequent words 2. the name of the text file to be read and analyzed 3. The name of the text file with the words to be ignored. 5. Output the Next 10 most frequent words starting from index N Your program must print out the next 10 most frequent words - not including the common words - starting index N in the text where N is passed as a command line argument. If two words have the same frequency, list them alphabetically. 6. Format your final output this way: Array doubled: Distinct non-common words: Total non-common words: Probability of next 10 words from rank - - - For example, using the command: ./Assignment2 25 mobydick.txt ignorewords.txt This function must read the stop words from the file with the name stored in ignoreWordFileName and store them in the ignoreWords array. You can assume there will be exactly 50 stop words. There is no return value. In case the file fails to open, print an error message using the below cout statement: std::cout header. However, you may write your own implementation of a sorting algorithm, such as Bubble Sort. Feel free to refer to the pseudocode for bubble sort that was given in Assignment 1. f. printTenFromN function void printTenFromN (wordRecord distinctWords [], int n, int totalNumWords); This function must print the next 10 words after the starting index N from the sorted array of distinctWords. These 10 words must be printed with their probability of occurrence up to 5 decimal places. The exact format of this printing is given below . The function does not return anything. Probability of occurrence of a word at position ind in the array is computed using the formula: (Don't forget to cast to float!) probability-of-occurrence = (float) unique Words[ind].count/ totalNumWords Array doubled: 8 Distinct non-common words: 13744 Total non-common words: 67327 Probability of next 10 words from rank 25 - - 0.00302 - other 0.00300 - over 0.00297 been 0.00296 these 0.00290 sea 0.00285 said 0.00282 down 0.00276 yet 0.00275 any 0.00270 - whales - Instructions In this assignment, we will write a program to analyze the word frequency of a document. As the number of words in the document may not be known a priori, we will implement a dynamically doubling array to store the information. Please read all the directions before writing code, as this write-up contains specific requirements for how the code must be written. Problem Overview: There are two files on Canvas. mobydick.txt - contains text to be read and analyzed by your program. The file contains the full text from Moby Dick. For your convenience, all the punctuation has been removed, words have been converted to lowercase, and the entire document can be read as if it were written on a single line. ignoreWords.txt contains the 50 of the most common words in the English language, which your program will ignore during analysis. Your program must take three command line arguments in the following order - a number N, the file name of the text to be read, and the file name containing the words to be ignored. It will read the text from the first file while ignoring the words from the second file and store all the unique words encountered in a dynamically doubling array. After necessary calculation, the program must print the following information: O . . The number of times array doubling was required to store all the unique words The number of unique non-ignore" words in the file The total word count of the file (excluding the ignore words) After calculating the probability of occurrence of each word and storing it in an array in the decreasing order of probability, starting from index N of the array, print the 10 most frequent words along with their probability (up to 5 decimal places) . For example, running your program with the command: ./Assignment2 25 mobydick.txt ignorewords.txt would print the next 10 words starting from index 25, i.e. your program must print the 25th-34th most frequent words, along with their respective probabilities. Keep in mind that these words must not be any of the words from ignoreWords.txt. The full results would be: Array doubled: 8 Distinct non-common words: 13744 Total non-common words: 67327 Probability of next 10 words from rank 25 0.00302 - other 0.00300 - over 0.00297 - been 0.00296 - these 0.00290 - sea 0.00285 - said 0.00282 - down 0.00276 - yet 0.00275 - any 0.00270 - whales Specifications: 1. Use an array of structs to store the words and their counts You will store each unique word and its count (the number of times it occurs in the document) in an array of structs. As the number of unique words is not known ahead of time, the array of structs must be dynamically sized. The struct must be defined as follows: struct wordRecord { string word; int count; }; 2. Use the array-doubling algorithm to increase the size of your array Your array will need to grow to fit the number of words in the file. Start with an array size of 100, and double the size whenever the array runs out of free space. You will need to allocate your array dynamically and copy values from the old array to the new array. (Array-doubling algorithm must be implemented in the main() function). Note: Don't use the built-in std::vector class. This will result in a loss of points. You're actually writing the code that the built-in vector uses behind-the-scenes! 3. Ignore the top 50 most common words that are read from the ignoreWords.txt file To get useful information about word frequency, we will be ignoring the 50 most common words in the English language as noted in ignoreWords.txt 4. Take three command line arguments Your program must take three command line arguments 1. a number N which tells your program the starting index to print the next 10 most frequent words 2. the name of the text file to be read and analyzed 3. The name of the text file with the words to be ignored. 5. Output the Next 10 most frequent words starting from index N Your program must print out the next 10 most frequent words - not including the common words - starting index N in the text where N is passed as a command line argument. If two words have the same frequency, list them alphabetically. 6. Format your final output this way: Array doubled: Distinct non-common words: Total non-common words: Probability of next 10 words from rank - - - For example, using the command: ./Assignment2 25 mobydick.txt ignorewords.txt This function must read the stop words from the file with the name stored in ignoreWordFileName and store them in the ignoreWords array. You can assume there will be exactly 50 stop words. There is no return value. In case the file fails to open, print an error message using the below cout statement: std::cout header. However, you may write your own implementation of a sorting algorithm, such as Bubble Sort. Feel free to refer to the pseudocode for bubble sort that was given in Assignment 1. f. printTenFromN function void printTenFromN (wordRecord distinctWords [], int n, int totalNumWords); This function must print the next 10 words after the starting index N from the sorted array of distinctWords. These 10 words must be printed with their probability of occurrence up to 5 decimal places. The exact format of this printing is given below . The function does not return anything. Probability of occurrence of a word at position ind in the array is computed using the formula: (Don't forget to cast to float!) probability-of-occurrence = (float) unique Words[ind].count/ totalNumWords Array doubled: 8 Distinct non-common words: 13744 Total non-common words: 67327 Probability of next 10 words from rank 25 - - 0.00302 - other 0.00300 - over 0.00297 been 0.00296 these 0.00290 sea 0.00285 said 0.00282 down 0.00276 yet 0.00275 any 0.00270 - whales

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!