Q2. Logistic Regression: Code [25] In this task, you will learn to build a Logistic Regression...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
Q2. Logistic Regression: Code [25] In this task, you will learn to build a Logistic Regression Classifier for the same "Financial Phrasebank" dataset. Bag of Words model will be used for this task. 1. Use 60% of the data selected randomly for training, 20% selected randomly for testing and the remaining 20% for validation set. Use classes 'positive' and negative' only. Perform the same cleaning tasks on the text data and build a vocabulary of the words. 2. Using CountVectorizer, fit the cleaned train data. This will create the bag-of-words model for the train data. Transform test and validation sets using same CountVectorizer. 3. To implement the logistic regression using following equations, Zi = W.xi = (z) we need the weight vector W. Create an array of dimension equal to those of each x; from the CountVectorizer. 4. Apply above equations over whole training dataset and calculate and cross-entropy loss LCE which can be calculated as LCE y log + (1 y) log(1 - ) 5. Now, update the weights as follows: - W+1 = W (i Yi).Xi Here, (i y;).x; is the gradient of sigmoid function and = 0.01 is the learning rate. 6. Repeat step 4 and step 5 for 500 iterations or epochs. For each iteration, calculate the cross-entropy loss on validation set. 7. Calculate the accuracy and macro-average precision, recall, and F1 score and provide the confusion matrix on the test set. 8. Experiment with varying values of = (0.0001, 0.001, 0.01, 0.1). Report your observations with respect to the performance of the model. You can vary the number of iterations to enhance the performance if necessary. Q2. Logistic Regression: Code [25] In this task, you will learn to build a Logistic Regression Classifier for the same "Financial Phrasebank" dataset. Bag of Words model will be used for this task. 1. Use 60% of the data selected randomly for training, 20% selected randomly for testing and the remaining 20% for validation set. Use classes 'positive' and negative' only. Perform the same cleaning tasks on the text data and build a vocabulary of the words. 2. Using CountVectorizer, fit the cleaned train data. This will create the bag-of-words model for the train data. Transform test and validation sets using same CountVectorizer. 3. To implement the logistic regression using following equations, Zi = W.xi = (z) we need the weight vector W. Create an array of dimension equal to those of each x; from the CountVectorizer. 4. Apply above equations over whole training dataset and calculate and cross-entropy loss LCE which can be calculated as LCE y log + (1 y) log(1 - ) 5. Now, update the weights as follows: - W+1 = W (i Yi).Xi Here, (i y;).x; is the gradient of sigmoid function and = 0.01 is the learning rate. 6. Repeat step 4 and step 5 for 500 iterations or epochs. For each iteration, calculate the cross-entropy loss on validation set. 7. Calculate the accuracy and macro-average precision, recall, and F1 score and provide the confusion matrix on the test set. 8. Experiment with varying values of = (0.0001, 0.001, 0.01, 0.1). Report your observations with respect to the performance of the model. You can vary the number of iterations to enhance the performance if necessary.
Expert Answer:
Related Book For
Artificial Intelligence A Modern Approach
ISBN: 9780134610993
4th Edition
Authors: Stuart Russell, Peter Norvig
Posted Date:
Students also viewed these algorithms questions
-
(REALLY NEED HELP CREATING THIS CODE IN FULL AND ITS COMPLETE ENTIRETY... ALL OF THE DETAILS ARE PROVIDED AND THE CODE SHOULD HAVE EACH PART FOR EACH QUESTION LABELED SEPARATELY... PLEASE HELP ME AND...
-
Assume you are considering opening a retail business. You are trying to decide whether to have a traditional brick-and-mortar store or to sell only online. Explain how the activities and costs differ...
-
What is meant by the "bag" assumption and why is it necessary in a multiproduct firm? What additional assumption must be made in multiproduct CVP analysis that doesn't pertain to a single-product CVP...
-
The screw of the clamp exerts a compressive force of 500 lb on the wood blocks. Sketch the stress distribution along section aa of the clamp. The cross section is rectangular, 0.75 in. by 0.50 in. 4...
-
Briefly describe the auditor's strategy when applying probability-proportional-tosize sampling.
-
Juniper Design Ltd. of Manchester, England, is a company specializing in providing design services to residential developers. Last year the company had net operating income of 600,000 on sales of...
-
How do neurobiological mechanisms, such as the stress response system and emotional regulation pathways, influence the dynamics of conflict escalation and resolution?
-
Orion Controls is a leading manufacturer of industrial valve systems, and Nathan Armstrong, head of Marketing, had been contacted by Andre Gide, EVP of Avion Chemical to place an order for 50 of...
-
Which of the following statements is true about radiation exposure of the male and female reproductive systems? A. The dose to induce temporary sterility in the female is 2 Gy. B. The latent period...
-
Under what conditions would you expect Paasche and Laspeyres indexes to be significantly different?
-
What is autocorrelation? What problems does autocorrelation cause? How can we detect autocorrelation?
-
What is specification bias? What problems does specification bias lead to? How can we avoid specification bias?
-
What is an index number? Give some examples of index numbers. Why are they useful?
-
What is a seasonal factor? Why is seasonality sometimes a problem in modeling time-series data? Give some examples of seasonal effects.
-
4. There are 4 boys (A, B, C, D) and 4 girls (a, b, c, d) who want to date. Each boy has a preference list, and so does each girl. A pair of a boy and a girl is called unstable if the boy has another...
-
Funds are separate fiscal and accounting entities, each with its own self-balancing set of accounts. The newly established Society for Ethical Teachings maintains two funds-a general fund for...
-
In this problem, we will play economist. Consider four variables, price (P), demand (Q), income (I), and wages (W). More specifically, where Q is the quantity of household demand for a product A, P...
-
A softmax layer in a neural network takes an input vector x and produces an output vector y, where Show that the sigmoid function is equivalent to a softmax with d = 2. yj = ej =1 ek
-
Analyze the potential threats from AI technology to society. What threats are most serious, and how might they be combated? How do they compare to the potential benefits?
-
True or false? Explain: hierarchy reduces complexity because A. It reduces the size of individual modules. B. It cuts down on the number of interconnections between elements. C. It assembles a number...
-
True or false? Explain: modularity reduces complexity because A. It reduces the effect of incommensurate scaling. B. It helps control propagation of effects.
-
Consider the part of the file system naming hierarchy illustrated in the following: a. In the path name and in the figure, identify the context that you should use for that resolution and the context...
Study smarter with the SolutionInn App