Code This section needs to be completed using Python 3.6+. You will also require following packages:

Fantastic news! We've Found the answer you've been seeking!

Question:

Transcribed Image Text:

Q2. Logistic Regression: Code [25] In this task, you will learn to build a Logistic Regression Classifier for the same "Financial Phrasebank" dataset. Bag of Words model will be used for this task. 1. Use 60% of the data selected randomly for training, 20% selected randomly for testing and the remaining 20% for validation set. Use classes 'positive' and 'negative' only. Perform the same cleaning tasks on the text data and build a vocabulary of the words. Link to the dataset. Link to the scikit-learn documentation for metrics. 2 2. Using CountVectorizer3, fit the cleaned train data. This will create the bag-of-words model for the train data. Transform test and validation sets using same CountVectorizer. 3. To implement the logistic regression using following equations, z = W.x = o(z) we need the weight vector W. Create an array of dimension equal to those of each x from the CountVectorizer. 4. Apply above equations over whole training dataset and calculate y and cross-entropy loss LCE which can be calculated as LCE = -y log y + (1 - y) log(1-) 5. Now, update the weights as follows: W+1 = W - (y - yi).xi Here, (y - y).x is the gradient of sigmoid function and a = 0.01 is the learning rate. 6. Repeat step 4 and step 5 for 500 iterations or epochs. For each iteration, calculate the cross-entropy loss on validation set. 7. Calculate the accuracy and macro-average precision, recall, and F1 score and provide the confusion matrix on the test set. Q2. Logistic Regression: Code [25] In this task, you will learn to build a Logistic Regression Classifier for the same "Financial Phrasebank" dataset. Bag of Words model will be used for this task. 1. Use 60% of the data selected randomly for training, 20% selected randomly for testing and the remaining 20% for validation set. Use classes 'positive' and 'negative' only. Perform the same cleaning tasks on the text data and build a vocabulary of the words. Link to the dataset. Link to the scikit-learn documentation for metrics. 2 2. Using CountVectorizer3, fit the cleaned train data. This will create the bag-of-words model for the train data. Transform test and validation sets using same CountVectorizer. 3. To implement the logistic regression using following equations, z = W.x = o(z) we need the weight vector W. Create an array of dimension equal to those of each x from the CountVectorizer. 4. Apply above equations over whole training dataset and calculate y and cross-entropy loss LCE which can be calculated as LCE = -y log y + (1 - y) log(1-) 5. Now, update the weights as follows: W+1 = W - (y - yi).xi Here, (y - y).x is the gradient of sigmoid function and a = 0.01 is the learning rate. 6. Repeat step 4 and step 5 for 500 iterations or epochs. For each iteration, calculate the cross-entropy loss on validation set. 7. Calculate the accuracy and macro-average precision, recall, and F1 score and provide the confusion matrix on the test set.

Related Book For answer-question

answer-question

Income Tax Fundamentals 2013

Income Tax Fundamentals 2013

ISBN: 9781285586618

31st Edition

Authors: Gerald E. Whittenburg, Martha Altus Buller, Steven L Gill

See More Books

Posted Date: Feb 13, 2024 01:25 PM

See More Questions