Question: We begin with a feature extraction function. The features we are going to use are called trigrams . A trigram is simply a string of

We begin with a feature extraction function. The features we are going to use are called trigrams. A trigram is simply a string of three contiguous characters. For example in the string "I love computing", there are lots of trigrams ( to be precise, where is the length of the string): ["I l"," lo","lov","ove"] are the first four of them, in sequence.

Write a function count_trigrams(document) that takes a string and returns a default dictionary with the frequency counts of the trigrams within the string (noting that if you have repeats of the same trigram in the string, the frequency will be ). Note that the output must be a default dictionary and not a standard dictionary, as it will be useful later. Note also that you should not modify the string in any way (e.g. remove punctuation, remove whitespace or convert to lower case) in calculating the frequencies.

Your code should behave as follows:

>>> count_trigrams("hel") 
defaultdict(, {'hel': 1.0}) 
>>> count_trigrams("aaaaa") 
defaultdict(, {'aaa': 3.0}) 
>>> count_trigrams("Boaty mcBoatFace.") 
defaultdict(, {'ty ': 1.0, 'Fac': 1.0, 'atF': 1.0, 'tFa': 1.0, 'mcB': 1.0, 'ce.': 1.0, 'cBo': 1.0, 'ace': 1.0, 'oat': 2.0, ' mc': 1.0, 'Boa': 2.0, 'y m': 1.0, 'aty': 1.0}) 

My thinking:

from collections import defaultdict as dd

def count_trigrams(document): """ count_trigrams takes a string and returns a dictionary of the counts of trigrams within the document. """

 

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!