Question: Question 1: Inheritance Here is a class Bagofwords that implements the bag of words model of text, or a simplification of it, in any case.


Question 1: Inheritance Here is a class Bagofwords that implements the bag of words model of text, or a simplification of it, in any case. You create a bag of words by passing to it a text string, like: bag = Bagof Words ("Hello I would like to travel to Naples, is Vesuvius erupting?) and then you can ask how many times a word occurred in the text string: bag.occurrences ("to") Note that I am using a defaultdict, which is a dictionary where keys that are not found are associated with a default value in this case, 0, as it is initialized as defaultdict(int)). [2] 1 from collections import defaultdict 2 3 class BagofWords (object): 4 5 def __init__(self, text): 6 words = self._text_split(text) 7 self.counts = defaultdict(int) 8 for win words: 9 self.counts[w] += 1 1e 11 def occurrences (self, word): 12 return self.counts[word] 13 14 def _text_split(self, text): 15 return text.split() [3] 1 bag = Bagofwords ("Hello I would like to travel to Naples, is Vesuvius erupting?") 2 bag.occurrences("to") 2. This works, but it's really a bit rudimentary; for instance: This works, but it's really a bit rudimentary, for instance: [4] 1 bag.occurrences ("Naples") The problem here is that the .split() function splits according to whitespace, and so the bag of words does not contain "Naples", but "Naples,", including the comma. Here is a function that splits text in a better way, taking care of eliminating punctuation, and also turns words into lowercase. It uses regular expressions. [5] 1 import re 2. 3 def split_into_words (text): return [w.lower() for w in re.findall(r"[\w']+", text)] 4 [6] 1 split_into_words ("Hello I would like to travel to Naples, is Vesuvius erupting?") ('hello', 'i'. would', 'like', 'to', travel', 'to' "naples 'is', vesuvius, erupting'] Ok, this works better. Now here is the challenge: write a subclass BetterBag of Bagofwords that uses this function instead of split().to split text into words. There are two ways of doing this. One is to do it... brute force. But the real challenge is: Can you do it without writing the _init_method for BetterBag? Can you do it so that all you need is 2 lines of code? Think about it. You don't lose points by using a more verbose or less elegant solution. But try to think at how you could do it. And btw, do use my function split_into_words unchanged, otherwise some test might fail. [16] 1 class BetterBag (BagofWords): 2 ### YOUR CODE HERE 3 pass
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
