Question: Write a function that preprocesses the natural language data and returns the stems of the tokenized tweet. ( 1 5 % ) a . Remove

Write a function that preprocesses the natural language data and returns the stems of the tokenized tweet. (15%) a. Remove the Twitter handles (i.e. @user_id) in the column tweet b. Remove punctuations, including ",.;:?!#" c. Remove numbers, i.e."0-9" d. Remove special characters and non-English characters e. Remove words with length 3 f. Tokenize the comments g. Apply stemming on the tokens and return the stems of the tokenized tweet. Code:
```
# Problem 3
import re
import nltk
from nltk.stem.porter import *
import string
def problem_3(df):
# write your logic here
# tweet data is stored in df['tweet']
tokenized_tweet =[]
# remove Twitter handles
# remove punctuations ,.;:?!#
# remove numbers
# remove special characters
# remove short words, length =3 is regarded as short word
# tokenization and stemming
return tokenized_tweet
```
Execution: \(\square \)
```
> df = pd.read_csv("tweet_data.csv")
> tokenized_tweet = problem_3(df)
> print(tokenized_tweet)
0[when, father, dysfunct, selfish, drag, kid, i...
1[thank, lyft, credit, can't, caus, they, don't...
[bihday, your, majesti]
[model, love, take, with, time]
[factsguid, societi, motiv]
*.
[that, youuu]
[nina, turner, airwav, tri, wrap, herself, man...
[listen, song, monday, morn, work]
[sikh, templ, vandalis, calgari, condemn]
[thank, follow]
tidy_tweet, Length: 31962, dtype: object
Name: ting(tokenized_tweet.shape)
(31962,)
```
Write a function that preprocesses the natural

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!