Question: #do not change the code in this cell #make sure you run this cell once if you are working on colab or on a fresh

#do not change the code in this cell

#make sure you run this cell once if you are working on colab or on a fresh installation of anaconda

import nltk

nltk.download('twitter_samples')

nltk.download('punkt')

# do not change the code in this cell

# make sure you run this cell

from nltk.corpus import twitter_samples

from nltk.tokenize import word_tokenize

import random

import math

def sample_sentences(corpus, sample_size):

size = len(corpus)

ids = random.sample(range(size), sample_size)

sample = [corpus[i] for i in ids]

return sample

random.seed(37)

tsample = sample_sentences(twitter_samples.strings(), 1000)

twittertokens = [word_tokenize(tweet.lower()) for tweet in tsample]

twittertokens[:5]

iii) Find the token with the highest part-of-speech tag ambiguity in the sample. Explain how you arrived at your answer.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!