Question: We begin with a feature extraction function. The features we are going to use are called trigrams . A trigram is simply a string of
We begin with a feature extraction function. The features we are going to use are called trigrams. A trigram is simply a string of three contiguous characters. For example in the string "I love computing", there are lots of trigrams ( to be precise, where is the length of the string): ["I l"," lo","lov","ove"] are the first four of them, in sequence.
Write a function count_trigrams(document) that takes a string and returns a default dictionary with the frequency counts of the trigrams within the string (noting that if you have repeats of the same trigram in the string, the frequency will be ). Note that the output must be a default dictionary and not a standard dictionary, as it will be useful later. Note also that you should not modify the string in any way (e.g. remove punctuation, remove whitespace or convert to lower case) in calculating the frequencies.
Your code should behave as follows:
>>> count_trigrams("hel") defaultdict(, {'hel': 1.0})
>>> count_trigrams("aaaaa") defaultdict(, {'aaa': 3.0})
>>> count_trigrams("Boaty mcBoatFace.") defaultdict(, {'ty ': 1.0, 'Fac': 1.0, 'atF': 1.0, 'tFa': 1.0, 'mcB': 1.0, 'ce.': 1.0, 'cBo': 1.0, 'ace': 1.0, 'oat': 2.0, ' mc': 1.0, 'Boa': 2.0, 'y m': 1.0, 'aty': 1.0})
My thinking:
from collections import defaultdict as dd
def count_trigrams(document): """ count_trigrams takes a string and returns a dictionary of the counts of trigrams within the document. """
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
