Question: why lemmatizer fails? def wrangling_doc(doc): tokens=doc.split() re_punc = re.compile('[%s]' % re.escape(string.punctuation)) # remove punctuation from each word tokens = [re_punc.sub('', w) for w in tokens]

why lemmatizer fails?

def wrangling_doc(doc): tokens=doc.split() re_punc = re.compile('[%s]' % re.escape(string.punctuation)) # remove punctuation from each word tokens = [re_punc.sub('', w) for w in tokens] # remove remaining tokens that are not alphabetic tokens = [word for word in tokens if word.isalpha()] # filter out short tokens tokens = [word for word in tokens if len(word) > 4] #lowercase all words tokens = [word.lower() for word in tokens] # Lemmatize the tokens tokens = [word.lemmatizer() for word in tokens]

return tokens

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!