Question: Hello. I m getting a couple of errors below when I try to create base line RNN model and second RNN model. Any sugguestions on

Hello. Im getting a couple of errors below when I try to create base line RNN model and second RNN model. Any sugguestions on how to fix?
You are a data scientist in an AI company. You are given a dataset of restaurant reviews. This is the sentiment140 dataset. It contains 1,600,000 tweets with six columns:
1. target: the polarity of the tweet (0= negative, 2= neutral, 4= positive)
2. ids: The id of the tweet ( for example, 2087)
3. date: the date of the tweet (for example, Sat May 1623:58:44 UTC 2009)
4. flag: The query (lyx). If there is no query, then this value is NO_QUERY.
5. user: the user that tweeted (for example, robotickilldozr)
6. text: the text of the tweet (for example, "Lyx is cool")
Data Source: Sentiment140 DatasetLinks to an external site.
The target is the polarity of the tweet. The features are the text. You are asked to perform sentiment analysis using deep learning by using the attached Jupyter Notebook and writing a Python script, and running all the cells. You only need to submit a JupyterNotebook.
1. Download the dataset that is about 81 MB from Kaggle into the local disk and unzip it.
2. Clean and preprocess the text data and split into training and test dataset.
3. Build a baseline RNN model using embedding layer and GRU on the training dataset and evaluate it on the test dataset.
4. Build a second RNN model using embedding layer and LSTM and evaluate it on the test dataset.
5. Build a third RNN model using embedding layer and GRU and LSTM and evaluate it on the test dataset.
6. Which model do you recommend for the model in Q3, Q4, and Q5? Justify your answer.
import zipfile
# Unzip the downloaded file
with zipfile.ZipFile('path_to_downloaded_file.zip', 'r') as zip_ref:
zip_ref.extractall('path_to_extract_dataset')
import pandas as pd
from sklearn.model_selection import train_test_split
import re
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
# Load the dataset into a pandas DataFrame
data_path = 'path_to_extract_dataset/training.1600000.processed.noemoticon.csv'
columns =['target', 'ids', 'date', 'flag', 'user', 'text']
df = pd.read_csv(data_path, encoding='latin-1', header=None, names=columns)
# Clean the text by removing unnecessary characters
df['text']= df['text'].apply(lambda x: re.sub(r'[^\w\s]','', str(x).lower()))
# Split the data into training and test datasets
train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, GRU, Dense
# Tokenize the text data
tokenizer = Tokenizer()
tokenizer.fit_on_texts(train_df['text'])
vocab_size = len(tokenizer.word_index)+1
# Convert text data to sequences
train_sequences = tokenizer.texts_to_sequences(train_df['text'])
test_sequences = tokenizer.texts_to_sequences(test_df['text'])
# Pad sequences to a fixed length
max_len =100
train_data = pad_sequences(train_sequences, maxlen=max_len, padding='post')
test_data = pad_sequences(test_sequences, maxlen=max_len, padding='post')
# Build the baseline RNN model
model1= Sequential()
model1.add(Embedding(vocab_size, 100, input_length=max_len))
model1.add(GRU(64))
model1.add(Dense(1, activation='sigmoid'))
model1.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
# Train the model
model1.fit(train_data, train_df['target'], validation_data=(test_data, test_df['target']), epochs=5)
Error***** above-ValueError: Unrecognized keyword arguments passed to Embedding: {'input_lenght': 100}
from tensorflow.keras.layers import LSTM
# Build the second RNN model
model2= Sequential()
model2.add(Embedding(vocab_size, 100, input_length=max_len))
model2.add(LSTM(64))
model2.add(Dense(1, activation='sigmoid'))
model2.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
# Train the model
model2.fit(train_data, train_df['target'], validation_data=(test_data, test_df['target']), epochs=5)
ERROR ABOVE*** ValueError: Unrecognized keyword arguments passed to Embedding: {'input_length': 100}
# Evaluate the models on the test dataset
_, acc1= model1.evaluate(test_data, test_df['target'])
_, acc2= model2.evaluate(test_data, test_df['target'])
_, acc3= model3.evaluate(test_da

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!