Q2. Perceptrons Instead of using Naive Bayes, you decide to try applying Perceptron to the interrogation...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
Q2. Perceptrons Instead of using Naive Bayes, you decide to try applying Perceptron to the interrogation data. You generate features from the training data as follows: • 91(x) = n where "not" appears n times. • v2(x) = m where "swear" appears m times. • P3(x) = 1, a bias term We use the labels +1 for G and -1 for 1. Given a weight vector w = (wi, W2, w3), our classifier returns +1 if Wi91 (x) + W242(x) + w3 >= 0 and -1 otherwise. Our training set from part 1 yields the following features and labels: Training Statements I am definitely innocent, officer Officer, I swear I am not lying I am not lying, I swear I am innocent, officer, I swear Officer, I am definitely not lying 1 Label P3 +1 1 1 +1 1 +1 1 1 -1 -1 (a) [2 pts] Compute the first two updates of the Perceptron algorithm and fill in the following table, using the given initial Perceptron weights w = (w1, w2, w3) and data points (P1, 42 P3. Label). W1 W2 W3 Initial 2 -0.5 Observing (0, 0, 1, +1) Observing (1, 0, 1, -1) (b) [2 pts] What convergence guarantees can you give for the Perceptron algorithm applied to this data set? (c) [2 pts] Linear classifiers are often insufficient to represent a dataset using a given set of features. However, it is often possible to find new features using nonlinear functions of our existing features which do allow linear classifiers to separate the data. Nonlinear features result in more expressive linear classifiers. For example, consider the following data set, where +'s represent positive examples and's represent negative examples. -1 +1 No linear classifier can separate the positive examples (-1, 0) and (1, 0) from the negative example (0,0). Rather than using a single feature, if we perform a nonlinear mapping o(x1, x2) = (x², 1}, the positive examples are both mapped to (1, 1) and the negative example is mapped to (0, 1), and we see the data can be separatedby a linear classifier. One example is the line w [1, -0.5], i.e. the classifier w'o(x) = x2 - 0.5 >= 0. %3D + +! +1 For what values of the weight vector w (wi, w2) does the classifier w'o(x) >= 0 separate the given data? d) [3 pts] Which of the following feature sets allows a linear classifier w = (w1, w2 w3) to separate the original interrogation data set? Justify your answer briefly. %3! () [1 pt] p'= (φ1 + P2 P1 - P2 1) %3D (ii) [1 pt] o' = (p192 oz 1) %3D (ii) [1 pt] o' = ((ı xor o2), q2 1) where a xor b is 1 if either a = 1 or b = 1 but not both. (e) [2 pts] Given the features o(x) = [x?, x, 1], how many data points are we guaranteed to be able to separate with zero error using a linear classifier w'o(x) = wix? + w2x + w3? Assume that a data pointx cannot have conflicting labels. Justify your answer briefly. () [2 pts] In general, if we use features o(x) = [xN-1, xN-2 ., 1], ie. an N - 1th order polynomial, how many points can we separate with zero error using a linear classifier w = [w,., WN ]? Justify your answer briefly. (g) [2 pts] Assume we have N labeled training data points, which we would like to use for classification i.e. to predict the labels of unseen test data points. What are the disadvantages of using an Nth order polynomial to fit this data? Q2. Perceptrons Instead of using Naive Bayes, you decide to try applying Perceptron to the interrogation data. You generate features from the training data as follows: • 91(x) = n where "not" appears n times. • v2(x) = m where "swear" appears m times. • P3(x) = 1, a bias term We use the labels +1 for G and -1 for 1. Given a weight vector w = (wi, W2, w3), our classifier returns +1 if Wi91 (x) + W242(x) + w3 >= 0 and -1 otherwise. Our training set from part 1 yields the following features and labels: Training Statements I am definitely innocent, officer Officer, I swear I am not lying I am not lying, I swear I am innocent, officer, I swear Officer, I am definitely not lying 1 Label P3 +1 1 1 +1 1 +1 1 1 -1 -1 (a) [2 pts] Compute the first two updates of the Perceptron algorithm and fill in the following table, using the given initial Perceptron weights w = (w1, w2, w3) and data points (P1, 42 P3. Label). W1 W2 W3 Initial 2 -0.5 Observing (0, 0, 1, +1) Observing (1, 0, 1, -1) (b) [2 pts] What convergence guarantees can you give for the Perceptron algorithm applied to this data set? (c) [2 pts] Linear classifiers are often insufficient to represent a dataset using a given set of features. However, it is often possible to find new features using nonlinear functions of our existing features which do allow linear classifiers to separate the data. Nonlinear features result in more expressive linear classifiers. For example, consider the following data set, where +'s represent positive examples and's represent negative examples. -1 +1 No linear classifier can separate the positive examples (-1, 0) and (1, 0) from the negative example (0,0). Rather than using a single feature, if we perform a nonlinear mapping o(x1, x2) = (x², 1}, the positive examples are both mapped to (1, 1) and the negative example is mapped to (0, 1), and we see the data can be separatedby a linear classifier. One example is the line w [1, -0.5], i.e. the classifier w'o(x) = x2 - 0.5 >= 0. %3D + +! +1 For what values of the weight vector w (wi, w2) does the classifier w'o(x) >= 0 separate the given data? d) [3 pts] Which of the following feature sets allows a linear classifier w = (w1, w2 w3) to separate the original interrogation data set? Justify your answer briefly. %3! () [1 pt] p'= (φ1 + P2 P1 - P2 1) %3D (ii) [1 pt] o' = (p192 oz 1) %3D (ii) [1 pt] o' = ((ı xor o2), q2 1) where a xor b is 1 if either a = 1 or b = 1 but not both. (e) [2 pts] Given the features o(x) = [x?, x, 1], how many data points are we guaranteed to be able to separate with zero error using a linear classifier w'o(x) = wix? + w2x + w3? Assume that a data pointx cannot have conflicting labels. Justify your answer briefly. () [2 pts] In general, if we use features o(x) = [xN-1, xN-2 ., 1], ie. an N - 1th order polynomial, how many points can we separate with zero error using a linear classifier w = [w,., WN ]? Justify your answer briefly. (g) [2 pts] Assume we have N labeled training data points, which we would like to use for classification i.e. to predict the labels of unseen test data points. What are the disadvantages of using an Nth order polynomial to fit this data?
Expert Answer:
Answer rating: 100% (QA)
Question 1 Nave Bayes Using the training data find the maximum likelihood estimate ofthe parameters they will be the classconditional relative frequen... View the full answer
Related Book For
International Business Law and Its Environment
ISBN: 978-0324649659
7th Edition
Authors: Richard schaffer, Filiberto agusti, Beverley earle
Posted Date:
Students also viewed these mathematics questions
-
A load of weight 400 N is suspended from a spring and two cords that are attached to blocks of weights 3W and W as shown. Knowing that the constant of the spring is 800 N/m, determine (a) The value...
-
Using Data Set 1 from Appendix B, if we let the predictor variable x represent heights of males and let the response variable y represent weights of males, the sample of 10-4 Prediction Intervals and...
-
A data set has n = 30,30i=1, x i = -67.11. 30i=1 1322.7, 30i=1 x2l = 582.0, 30i=1 y2i = 60,600 and 30i=1 xiy1 = -3840. Calculate ^0 ^1 and ^2. What is the fitted value when x = -2.0?
-
A heat engine cycle is executed with ammonia in the saturation dome. The pressure of ammonia is 1.5 MPa during heat addition and 0.6 MPa during heat rejection. What is the highest possible thermal...
-
Frankly, if we continue to grow, we will be out of business soon. This was the glum assessment of Kathy Lin, President and CEO of Purple Limited, a company that designs, manufactures, and retails...
-
Tracking Expenses With Budgeting View this video to gain additional insight about tracking expenses. As you listen, consider why it is important for you to track your expenses. An essential component...
-
Suppose you make 15 equal annual deposits of \($1,000\) each into a bank account paying 5% interest per year. The first deposit will be made one year from today. How much money can be withdrawn from...
-
Review the SEC's complaint against GE (see Note 1) and explain the specifics of the company are hedging transactions and why they violated GAAP.
-
Let p be the statement "Bob has a corgi", q be the statement "Bob is not busy", and r be the statement "Bob needs to clean his apartment." a) Translate the following statement into English: (p^q) r...
-
An investment company offers a bond linked to the FT100 index. On redemption the bond pays the face value plus the largest of A: the face value times the change in the index. Or B: 5% yearly interest...
-
Choose the appropriate way to solve the following system. 0.02 x 25 x+x-12x=26.8 15x+2x = -13.1 = 12.3 4x-2x + 5x3 - 6x-x+18.1x14x = 21.7
-
Maersk Line case Reflect about the role of social media in B2B industries. Please answer these two questions: 1. Why do you think Maersk has been so successful in social media? What are the expected...
-
working as a brand manager. write product description using sensory detail techniques. Underline the sensory words Topics: a) Big Mac: b) Dove soap: c) Bath and Body: d) Old Navy: e) Bombay palace:
-
A clear explanation the one specific consumer-based issue/problem/controversy or opportunity that Kodak Film is facing currently and that will be the subject of your project. Explain this as a...
-
Take notes while watching the Jean Kilbourne video, Killing Us Softly 4. The video is linked from the Unit 4 Lectures folder. Keep track of the themes Kilbourne presented, and example of programs she...
-
I would like to get clarification in creating a survey question. This is the question: "Could you please indicate the number of times you've listened to over-the-air, broadcast radio (i.e. AM/FM) in...
-
During the year, Scoop Ltd., a publicly traded company, issued bonds. The convertible bonds have a face value of $1,000,000, a term of five years, and pay annual interest at a stated rate of 8%. The...
-
The first national bank pays a 4% interest rate compound continuously. The effective annual rate paid by the bank is __________. a. 4.16% b. 4.20% c. 4.08% d. 4.12%
-
You are CEO of a large publicly traded company. You are negotiating several contracts with foreign governments in Vietnam, India, and Brazil to provide hardware and software to government agencies....
-
Mr. Wises marketing experts advise him that the Japanese market is hungry for his shoes. Focusing on technology transfer issues, discuss whether he should seek a Japanese joint venture partner or...
-
What are the real economic impacts and long-term effects of trade sanctions? Assume that the United States imposes punishingly high tariffs of 100 percent on Japanese cars. Immediate costs might be...
-
Professor Ray C. Fair's voting model was introduced in Exercise 2.23. He builds models that explain and predict the U.S. presidential elections. See his website at...
-
In the STAR experiment (Section 7.5.3), children were randomly assigned within schools into three types of classes: small classes with 13-17 students, regular-sized classes with 22-25 students, and...
-
Many cities in California have passed Inclusionary Zoning policies (also known as below-market housing mandates) as an attempt to make housing more affordable. These policies require developers to...
Study smarter with the SolutionInn App