Code This section needs to be completed using Python 3.6+. You will also require following packages:
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
Q2. Logistic Regression: Code [25] In this task, you will learn to build a Logistic Regression Classifier for the same "Financial Phrasebank" dataset. Bag of Words model will be used for this task. 1. Use 60% of the data selected randomly for training, 20% selected randomly for testing and the remaining 20% for validation set. Use classes 'positive' and 'negative' only. Perform the same cleaning tasks on the text data and build a vocabulary of the words. Link to the dataset. Link to the scikit-learn documentation for metrics. 2 2. Using CountVectorizer3, fit the cleaned train data. This will create the bag-of-words model for the train data. Transform test and validation sets using same CountVectorizer. 3. To implement the logistic regression using following equations, z = W.x = o(z) we need the weight vector W. Create an array of dimension equal to those of each x from the CountVectorizer. 4. Apply above equations over whole training dataset and calculate y and cross-entropy loss LCE which can be calculated as LCE = -y log y + (1 - y) log(1-) 5. Now, update the weights as follows: W+1 = W - (y - yi).xi Here, (y - y).x is the gradient of sigmoid function and a = 0.01 is the learning rate. 6. Repeat step 4 and step 5 for 500 iterations or epochs. For each iteration, calculate the cross-entropy loss on validation set. 7. Calculate the accuracy and macro-average precision, recall, and F1 score and provide the confusion matrix on the test set. Q2. Logistic Regression: Code [25] In this task, you will learn to build a Logistic Regression Classifier for the same "Financial Phrasebank" dataset. Bag of Words model will be used for this task. 1. Use 60% of the data selected randomly for training, 20% selected randomly for testing and the remaining 20% for validation set. Use classes 'positive' and 'negative' only. Perform the same cleaning tasks on the text data and build a vocabulary of the words. Link to the dataset. Link to the scikit-learn documentation for metrics. 2 2. Using CountVectorizer3, fit the cleaned train data. This will create the bag-of-words model for the train data. Transform test and validation sets using same CountVectorizer. 3. To implement the logistic regression using following equations, z = W.x = o(z) we need the weight vector W. Create an array of dimension equal to those of each x from the CountVectorizer. 4. Apply above equations over whole training dataset and calculate y and cross-entropy loss LCE which can be calculated as LCE = -y log y + (1 - y) log(1-) 5. Now, update the weights as follows: W+1 = W - (y - yi).xi Here, (y - y).x is the gradient of sigmoid function and a = 0.01 is the learning rate. 6. Repeat step 4 and step 5 for 500 iterations or epochs. For each iteration, calculate the cross-entropy loss on validation set. 7. Calculate the accuracy and macro-average precision, recall, and F1 score and provide the confusion matrix on the test set.
Expert Answer:
Related Book For
Income Tax Fundamentals 2013
ISBN: 9781285586618
31st Edition
Authors: Gerald E. Whittenburg, Martha Altus Buller, Steven L Gill
Posted Date:
Students also viewed these algorithms questions
-
In this assignment, you will construct a constituency tree and implement the task of POS tagging using constituency parsing and dependency parsing methods. Constituency parsing is the process of...
-
QUIZ... Let D be a poset and let f : D D be a monotone function. (i) Give the definition of the least pre-fixed point, fix (f), of f. Show that fix (f) is a fixed point of f. [5 marks] (ii) Show that...
-
It can be seen that in rolling a strip, the rolls will begin to slip if the back tension, b is too high. Derive an analytical expression for the magnitude of the back tension in order to make the...
-
Find (a) | A | (b) | B | (c) AB (d) | AB| 1. 2. 3. -1 3 B = -1 B = 6 -2 4 -2 2] 0 -1
-
A few years back, Dave and Jana bought a new home. They borrowed $230,415 at an annual fixed rate of 5.49% (15-year term) with monthly payments of $1,881.46. They just made their 50th payment, and...
-
The magnitude of the magnetic field in Figure P29.40 changes with time, and the magnitude of the accompanying electric field inside the magnetic field is given by \(E(r, t)=3 C r t^{2}\), where \(C\)...
-
Information for Canberra Corporation's intangible assets follows: 1. On January 1, 2014, Canberra signed an agreement to operate as a franchisee of Hsian Copy Service, Inc. for an initial franchise...
-
estion 45: Which key lets you select multiple worksheets in a workbook? swer: (Shift) 13 (Tab) O (Alt) O (Enter)
-
Write a paper describing the following recommended practices for improving industrial control systems cyber security with Defense-In-Depth Strategies for a fictitious sector-based company: Security...
-
Many Texas leaders often praise the small-government and low-tax values of the state and criticize the federal government for its tax policies and extensive aid programs. Examine the information in...
-
Nast Inc. is considering Projects S and L, whose cash flows are shown below. These projects are mutually exclusive, equally risky, and not repeatable. If the decision is made by choosing the project...
-
A Question of Motivation Alex and Stephanie have a few things in common. Both are students at their state's university, and both work full-time at a local supermarket to make ends meet and help pay...
-
Tesar Chemicals is considering Projects S and L, whose cash flows are shown below. These projects are mutually exclusive, equally risky, and not repeatable. The CEO believes the IRR is the best...
-
This problem involves reading in a text file of words in the English language that is too large to screenshot but would be willing to send to a tutor if possible and if that is needed named words.txt...
-
The 3 year USD bond issued by China Evergrande Company has coupon of 11.5%, which the company thinks is too high. In fact the bond price has since risen substantially and current yield is 9.5%. The...
-
On October 1, 2014, the Dow Jones Industrial Average (DJIA) opened at 17,042 points. During that day it lost 237 points. On October 2 it lost 4 points. On October 3 it gained 209 points. Deter-mine...
-
How much of each of the following prizes or awards is taxable? a. Cheline received a $50,000 gift bag at the Oscars in 2012. b. Jon received a gold watch worth $350 for 25 years of service to his...
-
During the 2012 tax year, Irma incurred the following expenses: Union dues..............................................................$275 Tax return preparation...
-
Jason and Mary Wells, friends of yours, were married on December 30, 2012. They know you are studying taxes and have come to you with a question concerning their filing status. Jason and Mary would...
-
A uniform circular wire of radius \(R\) is forced to rotate about a fixed vertical diameter at constant angular velocity \(\omega\). A bead of mass \(m\) experiences gravity, is smoothly threaded on...
-
Two point masses \(m\) and \(M\) are connected by a massless rod of length \(\ell\) and placed on a horizontal table. The mass \(m\) is also connected to a fixed point \(P\) on the table by a spring...
-
(a) Show that the relativistic Lagrangian \(L=-(1 / \gamma) m c^{2} e^{U / c^{2}}\) reduces to the Newtonian Lagrangian for a particle of mass \(m\) in a gravitational potential \(U\) in the slow...
Study smarter with the SolutionInn App