Question: Here, you will explore n-grams, a key concept in machine processing of text (e.g automated translation). Ann-gram of words is a group of n consecutive

Here, you will explore n-grams, a key concept in machine processing of text (e.g automated translation). Ann-gram of words is a group of n consecutive words. So the line: Cancdy is very yummy! Has three 2-grams, canay is , is very- and very yummy , two 3-grams. cancy is very\" and \"is very yummy\", and one 4-gram: \"candy is very yummy In this problem, you will need to create a function called topngram which takes two parameters, the first a string which is the name of a file, the second a value of n telling the size of the n-gram. Your code should return the string that is the most common n gram of the specified size Your code should do the following Ignore the case of any letters (After\" and \"after\" should count as the same 1-gram) Ignore punctuation and numbers(STOP!\" and \"stop\" should count as the same 1-gram) Ignore line breaks HellonThere?\" and \"hello, There!\" are the same 2-gram) Ignore the following common words: of, the, i, he, she, a, it, the, is, was, be, not, my For the text of Hawthorne's The Scarlet Letter the most common 2-gram is >>> topngram (\"scarlet.txt\", 2) hester prynne' For Robert E. Howard's Conan the Barbarian, the most common 2-gram is >>> topngram (\"conan.txt\", 2) 'his sword' For the text of William Shakespeare's Macbeth the most common 3-gram is >>>topngram (\"macbeth.txt\",3) enter lady macbeth For the text of Edgar Allan Poe's The Raven, the most common 3-gram is >>>topngram (\"raven.txt\",3) and nothing more For the text of Charles Dicken's A Christmas Carol, the most common 4-gram is >>>topngram (\"christmas.txt\", 4) 'good afternoon said scrooge' Hints and tips -You will probably find Python's dictionaries useful here First get the programming working with 1-grams (single words) Even though you will likely read the file in line-by-line, you still want to treat breaks in between lines just like a space between words -Try to break the problem down into sub-problems (e.g., remove punctuation, merge lines, build a dictionary, find most common). Code and test each sub-problem separately

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!