Q2. Perceptrons Instead of using Naive Bayes, you decide to try applying Perceptron to the interrogation...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
Q2. Perceptrons Instead of using Naive Bayes, you decide to try applying Perceptron to the interrogation data. You generate features from the training data as follows: • 91(x) = n where "not" appears n times. • v2(x) = m where "swear" appears m times. • P3(x) = 1, a bias term We use the labels +1 for G and -1 for 1. Given a weight vector w = (wi, W2, w3), our classifier returns +1 if Wi91 (x) + W242(x) + w3 >= 0 and -1 otherwise. Our training set from part 1 yields the following features and labels: Training Statements I am definitely innocent, officer Officer, I swear I am not lying I am not lying, I swear I am innocent, officer, I swear Officer, I am definitely not lying 1 Label P3 +1 1 1 +1 1 +1 1 1 -1 -1 (a) [2 pts] Compute the first two updates of the Perceptron algorithm and fill in the following table, using the given initial Perceptron weights w = (w1, w2, w3) and data points (P1, 42 P3. Label). W1 W2 W3 Initial 2 -0.5 Observing (0, 0, 1, +1) Observing (1, 0, 1, -1) (b) [2 pts] What convergence guarantees can you give for the Perceptron algorithm applied to this data set? (c) [2 pts] Linear classifiers are often insufficient to represent a dataset using a given set of features. However, it is often possible to find new features using nonlinear functions of our existing features which do allow linear classifiers to separate the data. Nonlinear features result in more expressive linear classifiers. For example, consider the following data set, where +'s represent positive examples and's represent negative examples. -1 +1 No linear classifier can separate the positive examples (-1, 0) and (1, 0) from the negative example (0,0). Rather than using a single feature, if we perform a nonlinear mapping o(x1, x2) = (x², 1}, the positive examples are both mapped to (1, 1) and the negative example is mapped to (0, 1), and we see the data can be separatedby a linear classifier. One example is the line w [1, -0.5], i.e. the classifier w'o(x) = x2 - 0.5 >= 0. %3D + +! +1 For what values of the weight vector w (wi, w2) does the classifier w'o(x) >= 0 separate the given data? d) [3 pts] Which of the following feature sets allows a linear classifier w = (w1, w2 w3) to separate the original interrogation data set? Justify your answer briefly. %3! () [1 pt] p'= (φ1 + P2 P1 - P2 1) %3D (ii) [1 pt] o' = (p192 oz 1) %3D (ii) [1 pt] o' = ((ı xor o2), q2 1) where a xor b is 1 if either a = 1 or b = 1 but not both. (e) [2 pts] Given the features o(x) = [x?, x, 1], how many data points are we guaranteed to be able to separate with zero error using a linear classifier w'o(x) = wix? + w2x + w3? Assume that a data pointx cannot have conflicting labels. Justify your answer briefly. () [2 pts] In general, if we use features o(x) = [xN-1, xN-2 ., 1], ie. an N - 1th order polynomial, how many points can we separate with zero error using a linear classifier w = [w,., WN ]? Justify your answer briefly. (g) [2 pts] Assume we have N labeled training data points, which we would like to use for classification i.e. to predict the labels of unseen test data points. What are the disadvantages of using an Nth order polynomial to fit this data? Q2. Perceptrons Instead of using Naive Bayes, you decide to try applying Perceptron to the interrogation data. You generate features from the training data as follows: • 91(x) = n where "not" appears n times. • v2(x) = m where "swear" appears m times. • P3(x) = 1, a bias term We use the labels +1 for G and -1 for 1. Given a weight vector w = (wi, W2, w3), our classifier returns +1 if Wi91 (x) + W242(x) + w3 >= 0 and -1 otherwise. Our training set from part 1 yields the following features and labels: Training Statements I am definitely innocent, officer Officer, I swear I am not lying I am not lying, I swear I am innocent, officer, I swear Officer, I am definitely not lying 1 Label P3 +1 1 1 +1 1 +1 1 1 -1 -1 (a) [2 pts] Compute the first two updates of the Perceptron algorithm and fill in the following table, using the given initial Perceptron weights w = (w1, w2, w3) and data points (P1, 42 P3. Label). W1 W2 W3 Initial 2 -0.5 Observing (0, 0, 1, +1) Observing (1, 0, 1, -1) (b) [2 pts] What convergence guarantees can you give for the Perceptron algorithm applied to this data set? (c) [2 pts] Linear classifiers are often insufficient to represent a dataset using a given set of features. However, it is often possible to find new features using nonlinear functions of our existing features which do allow linear classifiers to separate the data. Nonlinear features result in more expressive linear classifiers. For example, consider the following data set, where +'s represent positive examples and's represent negative examples. -1 +1 No linear classifier can separate the positive examples (-1, 0) and (1, 0) from the negative example (0,0). Rather than using a single feature, if we perform a nonlinear mapping o(x1, x2) = (x², 1}, the positive examples are both mapped to (1, 1) and the negative example is mapped to (0, 1), and we see the data can be separatedby a linear classifier. One example is the line w [1, -0.5], i.e. the classifier w'o(x) = x2 - 0.5 >= 0. %3D + +! +1 For what values of the weight vector w (wi, w2) does the classifier w'o(x) >= 0 separate the given data? d) [3 pts] Which of the following feature sets allows a linear classifier w = (w1, w2 w3) to separate the original interrogation data set? Justify your answer briefly. %3! () [1 pt] p'= (φ1 + P2 P1 - P2 1) %3D (ii) [1 pt] o' = (p192 oz 1) %3D (ii) [1 pt] o' = ((ı xor o2), q2 1) where a xor b is 1 if either a = 1 or b = 1 but not both. (e) [2 pts] Given the features o(x) = [x?, x, 1], how many data points are we guaranteed to be able to separate with zero error using a linear classifier w'o(x) = wix? + w2x + w3? Assume that a data pointx cannot have conflicting labels. Justify your answer briefly. () [2 pts] In general, if we use features o(x) = [xN-1, xN-2 ., 1], ie. an N - 1th order polynomial, how many points can we separate with zero error using a linear classifier w = [w,., WN ]? Justify your answer briefly. (g) [2 pts] Assume we have N labeled training data points, which we would like to use for classification i.e. to predict the labels of unseen test data points. What are the disadvantages of using an Nth order polynomial to fit this data?
Expert Answer:
Answer rating: 100% (QA)
Question 1 Nave Bayes Using the training data find the maximum likelihood estimate ofthe parameters they will be the classconditional relative frequen... View the full answer
Related Book For
International Business Law and Its Environment
ISBN: 978-0324649659
7th Edition
Authors: Richard schaffer, Filiberto agusti, Beverley earle
Posted Date:
Students also viewed these mathematics questions
-
A load of weight 400 N is suspended from a spring and two cords that are attached to blocks of weights 3W and W as shown. Knowing that the constant of the spring is 800 N/m, determine (a) The value...
-
Using Data Set 1 from Appendix B, if we let the predictor variable x represent heights of males and let the response variable y represent weights of males, the sample of 10-4 Prediction Intervals and...
-
A data set has n = 30,30i=1, x i = -67.11. 30i=1 1322.7, 30i=1 x2l = 582.0, 30i=1 y2i = 60,600 and 30i=1 xiy1 = -3840. Calculate ^0 ^1 and ^2. What is the fitted value when x = -2.0?
-
A heat engine cycle is executed with ammonia in the saturation dome. The pressure of ammonia is 1.5 MPa during heat addition and 0.6 MPa during heat rejection. What is the highest possible thermal...
-
The following balances are from the accounts of Todd Machining Company: Direct materials purchased during the year amount to $598,000, and the cost of goods sold for the year was $2,172,400. Required...
-
Use synthetic division to find the indicated function value. a. f (x) = x3 + 2x2 - 13x + 10; f (-2) b. f (x) = x4 - 16; f (-2) c. f (x) = x5 - 4x4 + x3 - x2 + 2x - 100; f (-10).
-
A comparative balance sheet for Zephyr Corporation is provided in the Working Papers. The Income statement for the current year indicates that net income was \($10,160.00\) and the depreciation...
-
Using the numerical example in problem 5, assume now that Foreign limits immigration so that only 2 workers can move there from Home. Calculate how the movement of these two workers affects the...
-
21. Indicate the similarities between effect-modifying variables and confounding variables. Check all that apply. (2 points) Both are related to the exposure and outcome Both are not a part of the...
-
Name some users of accounting information.
-
Which one of the following would be considered employment income for Canadian income tax purposes for 2018?
-
What is the impact of the Florida Supreme Court's ruling that a real estate broker's relationship to the public "exacts the highest degree of trust and confi- dence" on sales associates? What is a...
-
3. [10 pts] Consider the following demand function d(x) and supply function s(x). d(x) = 700 -0.6x s(x) = 0.8x a) Find the market demand (equilibrium point the positive x value where demand and...
-
Given: f ( x ) = ( 2 x + 3 ) sin ( x ) , find the first derivative of this function.
-
Set in April 2016, this case invites students to assess FedEx Corp.'s and United Parcel Service, Inc.'s (UPS's) financial performance. The two firms have competed for dominance of the...
-
1. A special manufacturing and handling device was purchased by Alfonso Manufacturing for $200,000 and is depreciated over MACRS. CFBT is estimated to amount to $800,000 for the first 2 years...
-
E-wallet adoption in Malaysia is still in its infancy.The Malaysian government's aspiration is trying to transform the nation into a cashless society by 2025. Therefore, e-wallet providers must...
-
The first national bank pays a 4% interest rate compound continuously. The effective annual rate paid by the bank is __________. a. 4.16% b. 4.20% c. 4.08% d. 4.12%
-
You are CEO of a large publicly traded company. You are negotiating several contracts with foreign governments in Vietnam, India, and Brazil to provide hardware and software to government agencies....
-
Mr. Wises marketing experts advise him that the Japanese market is hungry for his shoes. Focusing on technology transfer issues, discuss whether he should seek a Japanese joint venture partner or...
-
What are the real economic impacts and long-term effects of trade sanctions? Assume that the United States imposes punishingly high tariffs of 100 percent on Japanese cars. Immediate costs might be...
-
Many retailers have installed point-of-sale (POS) systems to better manage their inventory and to save time and cut costs associated with ringing up customer sales. How should employees be trained to...
-
Airlines have long used database technology to monitor capacity and maximize revenue by charging different prices to almost every passenger on a given flight. To what extent can other industries...
-
Why not use copies of the sales order as the picking ticket, packing list, and sales invoice?
Study smarter with the SolutionInn App