Question: Consider a binary classification problem where there is a single feature X ER and the depen- dent variable Y = {0, 1}. Let Px,y

Consider a binary classification problem where there is a single feature X ER and the depen- dent variable Y = {0, 1}. Let Px,y denote the joint distribution over pairs (X,Y), and let h: R {0, 1} denote a generic classifier. We define the error rate of h as R(h) := Pr(Y #h(X)), where the probability is computed from the joint distribution Px,y. Suppose that we collect data (x1, y),..., (xn, Yn), which are assumed to be independent and identically distributed from the distribution Px,y, and that we train a classifier using this data. Below we consider two precise specifications of the joint distribution Px,y. In both cases, derive the largest numerical value * that you can for which it holds that R(h) e*. Carefully explain how you arrived at your specific value of * in both cases. a) (10 points) X is restricted to the set {0,1} (i.e., is categorical) and Px,y is given by the distribution in Table 1. b) (10 points) The marginal distribution of Y is given by Pr(Y = 0) = 0.3 and Pr(Y = 1) = 0.7. Given that Y = 0, the distribution of X is normal with mean 5 and variance 2. Given that Y = 1, the distribution of X is normal with mean 3 and variance 2. (It is OK for the final answer to be written in terms of the CDF of a normal distribution and/or to numerically approximate this number up to 5 digits) Table 1 Outcome (X=0, Y=0) (X=0, Y = 1) (X= 1, Y = 0) (X = 1, Y = 1) Probability 0.1 0.2 0.4 0.3 Consider a binary classification problem where there is a single feature X ER and the depen- dent variable Y = {0, 1}. Let Px,y denote the joint distribution over pairs (X,Y), and let h: R {0, 1} denote a generic classifier. We define the error rate of h as R(h) := Pr(Y #h(X)), where the probability is computed from the joint distribution Px,y. Suppose that we collect data (x1, y),..., (xn, Yn), which are assumed to be independent and identically distributed from the distribution Px,y, and that we train a classifier using this data. Below we consider two precise specifications of the joint distribution Px,y. In both cases, derive the largest numerical value * that you can for which it holds that R(h) e*. Carefully explain how you arrived at your specific value of * in both cases. a) (10 points) X is restricted to the set {0,1} (i.e., is categorical) and Px,y is given by the distribution in Table 1. b) (10 points) The marginal distribution of Y is given by Pr(Y = 0) = 0.3 and Pr(Y = 1) = 0.7. Given that Y = 0, the distribution of X is normal with mean 5 and variance 2. Given that Y = 1, the distribution of X is normal with mean 3 and variance 2. (It is OK for the final answer to be written in terms of the CDF of a normal distribution and/or to numerically approximate this number up to 5 digits) Table 1 Outcome (X=0, Y=0) (X=0, Y = 1) (X= 1, Y = 0) (X = 1, Y = 1) Probability 0.1 0.2 0.4 0.3 Consider a binary classification problem where there is a single feature X ER and the depen- dent variable Y = {0, 1}. Let Px,y denote the joint distribution over pairs (X,Y), and let h: R {0, 1} denote a generic classifier. We define the error rate of h as R(h) := Pr(Y #h(X)), where the probability is computed from the joint distribution Px,y. Suppose that we collect data (x1, y),..., (xn, Yn), which are assumed to be independent and identically distributed from the distribution Px,y, and that we train a classifier using this data. Below we consider two precise specifications of the joint distribution Px,y. In both cases, derive the largest numerical value * that you can for which it holds that R(h) e*. Carefully explain how you arrived at your specific value of * in both cases. a) (10 points) X is restricted to the set {0,1} (i.e., is categorical) and Px,y is given by the distribution in Table 1. b) (10 points) The marginal distribution of Y is given by Pr(Y = 0) = 0.3 and Pr(Y = 1) = 0.7. Given that Y = 0, the distribution of X is normal with mean 5 and variance 2. Given that Y = 1, the distribution of X is normal with mean 3 and variance 2. (It is OK for the final answer to be written in terms of the CDF of a normal distribution and/or to numerically approximate this number up to 5 digits) Table 1 Outcome (X=0, Y=0) (X=0, Y = 1) (X= 1, Y = 0) (X = 1, Y = 1) Probability 0.1 0.2 0.4 0.3

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock

To find the largest numerical value for which the error rate of the classifier is greater than we need to calculate the error rate under two different ... View full answer

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

A random sample of nine pairs of measurements is shown in the following table (saved in the LM14_40 file). a. Use the Wilcoxon signed rank test to determine wheth er the data provide sufficient...

Human Development/Life Span Group Presentation Each Group will choose a segment of the human life span that is particularly interesting to them: Adolescence (13 years through about 17 years). Based...

QUESTION 21 Which of the following is not a wrapper class? A. String B. Integer C. Character D. Double QUESTION 22 The conversion of an object of a wrapper class to a value of its associated...

You deposit $12,000 annually into a life insurance fund for the next 10 years, at which time you plan to retire. Instead of a lump sum, you wish to receive annuities for the next 20 years. What is...

IPO Under pricing in 1980, a certain assistant professor of finance bought 12 initial public offerings of common stock. He held each of these for approximately one month and then sold. The investment...

(APPENDIX 16B) SEQUENTIAL METHOD Lanoka Company manufactures pottery in two producing departments: Shaping and Firing. Three support departments support the following production departments: Power,...

1. In mathematics class, ask students to point out all the examples of right angles that they can find in the room.

The following question illustrates the APT. Imagine that there are only two pervasive macroeconomic factors. Investments X, Y, and Z have the following sensitivities to these two factors: We assume...

This Question: 2 pts 6 of 8 (4 complete) The debt is amortized by the periodic payment shown. Compute (a) the number of payments required to amortize the debt; (b) the outstanding principal at the...

Oil (sp. gr. = 0.8) flows smoothly through the circular reducing section shown at 3 ft 3 /s. If the entering and leaving velocity profiles are uniform, estimate the force that must be applied to the...

(b) What would be your expected profit if you were able to buy 2,000 of the IPO shares? Do you expect to be able to do this? Why/why not? (5 marks)

Dave was a bachelor who lived in Sydney. He was a loner and in ill health. He contacted his sister, Keren, in Oxford, England and asked her to move to Sydney to help look after him. He promised that...

The demand equation for pen per day is as under (6) D 160- 15 P Calculate (a) How many pens per day can the firm sell at a price of Rs. 10 per pen? (b) If firm wants to sell two pens per day what...

The balance sheet for Quinn Corporation is shown here in market value terms. There are 6,000 shares of stock outstanding. Market Value Balance Sheet Cash Fixed assets $45,500 490,000 Equity $535,500...

ZARA, a Spanish fashion giant, has redefined fast fashion. Founded in 1 9 7 5 by Amancio Ortega, ZARA boasts over 2 , 0 0 0 stores in 9 6 countries, a testament to its rapid global expansion. Their...

Question 4 (10 points) Listen Base on the video "United Breaks Guitar" by Dave, answer the following questions: Look up on internet and describe what has impact on United Airlines after this song...

When a statement of cash flows is prepared using the indirect methoo: Select one: a. Net income is the starting point in determining cash flows from Investments b. Cash paid for dividends is not...

What impact has the Internet had on the globalization of small firms? How do you think small companies will use the Internet for business in the future?

Wild irises are beautiful flowers found throughout the United States, Canada, and northern Europe. This problem concerns the length of the sepal (leaf-like part covering the flower) of different...

The fan blades on commercial jet engines must be replaced when wear on these parts indicates too much variability to pass inspection. If a single fan blade broke during operation, it could severely...

Police are tested for their ability to correctly recognize and identify a suspect based on a witness or victims verbal description of the suspect. Scores on the identification test range from 0 to...

4. What are the contributions of the right hemisphere to emotional behaviors and interpreting other peoples emotions?

1. Much of the play behavior of a cat can be analyzed into attack and escape components. Is the same true for childrens play?

11. What brain mechanism enables the startle refl ex to be so fast?