Question: Use the following starter code (type it in so you get more practice setting up programs): // ========================================== // Created: August 23, 2018 // @author

Use the following starter code (type it in so you get

more practice setting up programs): // ========================================== // Created: August 23, 2018

// @author // // Description: Counts unique words in a file //

Use the following starter code (type it in so you get more practice setting up programs):

// ==========================================

// Created: August 23, 2018

// @author

// Description: Counts unique words in a file

// outputs the top N most common words

// ==========================================

#include

using namespace std;

// struct to store word + count combinations

struct wordItem

{

string word;

int count;

};

* Function name: getStopWords

* Purpose: read stop word list from file and store into vector

* @param ignoreWordFile - filename (the file storing ignore/stop words)

* @param _vecIgnoreWords - store ignore words from the file (pass by reference)

* @return - none

* Note: The number of words is fixed to 50

void getStopWords(const char *ignoreWordFileName, vector &_vecIgnoreWords);

* Function name: isStopWord

* Purpose: to see if a word is a stop word

* @param word - a word (which you want to check if it is a stop word)

* @param _vecIgnoreWords - the vector type of string storing ignore/stop words

* @return - true (if word is a stop word), or false (otherwise)

bool isStopWord(string word, vector &_vecIgnoreWords);

* Function name: getTotalNumberNonStopWords

* Purpose: compute the total number of words saved in the words array (including repeated words)

* @param list - an array of wordItems containing non-stopwords

* @param length - the length of the words array

* @return - the total number of words in the words array (including repeated words multiple times)

int getTotalNumberNonStopWords(wordItem list[], int length);

* Function name: arraySort

* Purpose: sort an array of wordItems, decreasing, by their count fields

* @param list - an array of wordItems to be sorted

* @param length - the length of the words array

void arraySort(wordItem list[], int length);

/**

* Function name: printTopN

* Purpose: to print the top N high frequency words

* @param wordItemList - a pointer that points to a *sorted* array of wordItems

* @param topN - the number of top frequency words to print

* @return none

void printTopN(wordItem wordItemList[], int topN);

const int STOPWORD_LIST_SIZE = 50;

const int INITIAL_ARRAY_SIZE = 100;

// ./a.out 10 HW2-HungerGames_edit.txt HW2-ignoreWords.txt

int main(int argc, char *argv[])

{

vector vecIgnoreWords(STOPWORD_LIST_SIZE);

// verify we have the correct # of parameters, else throw error msg & return

if (argc != 4)

{

cout "Usage: ";

cout 0] " ";

cout

return 0;

}

/* **********************************

1. Implement your code here.

Read words from the file passed in on the command line, store and

count the words.

2. Implement the six functions after the main() function separately.

********************************** */

return 0;

}

void getStopWords(const char *ignoreWordFileName, vector &_vecIgnoreWords)

{

}

bool isStopWord(string word, vector &_vecIgnoreWords)

{

return true;

}

int getTotalNumberNonStopWords(wordItem list[], int length)

{

return 0;

}

void arraySort(wordItem list[], int length)

{

}

void printTopN(wordItem wordItemList[], int topN)

{

}

Background There are several fields in computer science that aim to understand how people use language. This can include analyzing the most frequently used words by certain authors and then going one step further to ask a question such as: Given what we know about Hemingway's language patterns, do we believe Hemingway wrote this lost manuscript?" In this assignment, we're going to do a basic introduction to document analysis by determining the number of unique words and the most frequently used words in two documents. if you enjoy this, take elective courses on Natural-Language Processing hat your program needs to do There is one test file on the website HW2-HungerGames_edit.txt that contains the full text from Hunger Games Book 1. We have pre-processed the file to remove all punctuation and down-cased all words. We will test on a different file! There is also the ignore words file HW2-ignoreWords.txt that contain the top 50 common words usually ignored during natural-language processing (aka stop words) Your program will calculate the following information on any text file The top n words (excluding stop words; n is also a command-line argument) and the number of times each word was found The total number of unique words (excluding stop words) in the file The total number of words (excluding stop words) in the file The number of array doublings needed to store all unique words in the file . . Example: Your program takes three command-line arguments: the number of most common words to print out, the name of the file to process, and the stop word list file Running your program using ./a.out 10 HW2-HungerGames_edit.txt HW2-ignoreWords.txt

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

CAN ANYONE PLEASE HELP, THIS IS DUE BY TONIGHT!! THANKS !!! CSC 142 Media Mash-Up! Please submit either a .zip or .jar file. We've got a mash-up! Different species from different media (books,...

The Final Project is to develop a simple database system. The database is to handle multiple records, each composed of several fields. The database will store its information to a file, addition and...

Hi, I'm having an issue with this Simnet project and I was wondering if anyone could help me. There is only one issue: Step 3a, in the instruction pdf that I have attached. When my project is graded...

ITM 309: Business Information Technology and Systems Spring 2016 Watson and the new era of cognitive systems Jerry Haan IBM Cloud Ecosystem Development January 27, 2016 2013 International Business...

What are the biggest ah-ha! moments from Oracy Development? 6 English-Language Oracy Development Learning Outcomes After reading this chapter, you should be able to ... . Describe the basics of...

Help with a java project please: User Request "Create a simple system to read, search, remove, and write restaurant data." Objectives: 1. 2. 3. 4. 5. 10 points 10 points Use standard Java I/O to...

please solve this lab in c++ in virtual machine I will be very thankful to you I have edited it more information in the bottom. please let me know if you still need more information. inheritance.cpp...

I am doing tax return project. attachments are the materials professor provide. He wants me to do a current year engagement file (Similar with prior year engagement file). AC 371 Tax Return Project...

Complete class diagram (phase 5 and 6). 17 Questions. 17 To be submitted: 18 Marking Criteria. 20 Appendix A: Sample Output for a completed phase 6. 22 Objectives In this assignment you will develop...

Requirement #1 Prepare the following: a.) Flexible Budget using the information on Exhibit 1. You will need to get a budgeted rate per unit for the variable expenses like we did in class to make...

Nelson Co. manufactures a product that requires 3.5 machine hours per unit. The variable and fixed overhead rates were computed using expected capacity of 144,000 units (produced evenly throughout...

If Sweden and Swiss engage in much financial flows but little trade, _ _ _ _ directly influences their exchange rate the most. If Spain and Portugal engage in substantial trade but little financial...

Compared with half a century ago, adoption has become _ _ _ _ _ _ _ _ _ common, but it is more open and acceptabl e , so we probably discuss it _ _ _ _ _ _ _ . fill in the blanks more or much less or...

How does the Job Level Table differ from the Job Family and Occupation Tables, and how are all Three tables related?

What is the Definition for Third Normal Form?

Provide two examples of a One-To-Many relationship.