Question: Create it Create a function named find_ngrams_bow: it has 4 parameters (words, n, bow=False, stopwords=[]) words is a list of tokens/words each word should be
Create it
Create a function named find_ngrams_bow:
- it has 4 parameters (words, n, bow=False, stopwords=[])
- words is a list of tokens/words
- each word should be converted to lowercase
- if bow is True, create ngrams such that order of the ngram words is no longer considered. Hence, each ngram is simply a bag-of-words (BOW). You can implement this by always using the alphabetical order for the words. For example the two ngrams, 'he said fine' and 'fine he said' would be the same ngram in the BOW model.
- if stopwords contains words, those words should not be considered part of the text
- the auto-grader is set for a 10 second CPU limit, but you should be able to solve this in sub-second time.
import LessonUtil as Util
import collections
def find_ngrams_bow()
return []
def simple_test():
text = Util.read_data_file('hp1.txt')
ngrams = find_ngrams_bow(text.split(), 3)
top5 = collections.Counter(ngrams).most_common(5)
print(top5)
print(ide.tester.test_function('find_ngrams_bow'))
Use it
With everything working, you will now use find_ngrams_bow to help support your research:
Question 1: write a function named q1 that takes no parameters.
The function will use find_ngrams_bow to answer the following question:
As the n in ngrams increases, would you expect the BOW ngram counts to be higher or lower than non-BOW version?
- make sure you understand the question
- answer it BEFORE writing any code
- now write the code inside q1 that will help you confirm/deny your answer. You can use any method you want (print statements, analytical calculations, etc).
- q1 provides evidence to support the truth
Question 2: write a function named q2 that takes no parameters.
The function will use find_ngrams_bow to answer the following question:
If you add stopwords, should you see higher or lower counts in your ngrams?
- make sure you understand the question
- answer it BEFORE writing any code
- now write the code inside q2 that will help you confirm/deny your answer. You can use any method you want (print statements, analytical calculations, etc).
- q2 provides evidence to support the truth
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
