Consider a binary classification problem where there is a single feature X ER and the depen-...

Fantastic news! We've Found the answer you've been seeking!

Question:

Transcribed Image Text:

Consider a binary classification problem where there is a single feature X ER and the depen- dent variable Y = {0, 1}. Let Px,y denote the joint distribution over pairs (X,Y), and let h: R {0, 1} denote a generic classifier. We define the error rate of h as R(h) := Pr(Y #h(X)), where the probability is computed from the joint distribution Px,y. Suppose that we collect data (x1, y₁),..., (xn, Yn), which are assumed to be independent and identically distributed from the distribution Px,y, and that we train a classifier ĥ using this data. Below we consider two precise specifications of the joint distribution Px,y. In both cases, derive the largest numerical value * that you can for which it holds that R(h) ≥ e. Carefully explain how you arrived at your specific value of ɛ in both cases. a) (10 points) X is restricted to the set {0,1} (i.e., is categorical) and Px,y is given by the distribution in Table 1. b) (10 points) The marginal distribution of Y is given by Pr(Y = 0) = 0.3 and Pr(Y = 1) = 0.7. Given that Y = 0, the distribution of X is normal with mean 5 and variance 2. Given that Y = 1, the distribution of X is normal with mean −3 and variance 2. (It is OK for the final answer to be written in terms of the CDF of a normal distribution and/or to numerically approximate this number up to 5 digits) Table 1 Outcome (X=0, Y=0) (X=0, Y = 1) (X= 1, Y = 0) (X = 1, Y = 1) Probability 0.1 0.2 0.4 0.3 Consider a binary classification problem where there is a single feature X ER and the depen- dent variable Y = {0, 1}. Let Px,y denote the joint distribution over pairs (X,Y), and let h: R {0, 1} denote a generic classifier. We define the error rate of h as R(h) := Pr(Y #h(X)), where the probability is computed from the joint distribution Px,y. Suppose that we collect data (x1, y₁),..., (xn, Yn), which are assumed to be independent and identically distributed from the distribution Px,y, and that we train a classifier ĥ using this data. Below we consider two precise specifications of the joint distribution Px,y. In both cases, derive the largest numerical value * that you can for which it holds that R(h) ≥ e. Carefully explain how you arrived at your specific value of ɛ in both cases. a) (10 points) X is restricted to the set {0,1} (i.e., is categorical) and Px,y is given by the distribution in Table 1. b) (10 points) The marginal distribution of Y is given by Pr(Y = 0) = 0.3 and Pr(Y = 1) = 0.7. Given that Y = 0, the distribution of X is normal with mean 5 and variance 2. Given that Y = 1, the distribution of X is normal with mean −3 and variance 2. (It is OK for the final answer to be written in terms of the CDF of a normal distribution and/or to numerically approximate this number up to 5 digits) Table 1 Outcome (X=0, Y=0) (X=0, Y = 1) (X= 1, Y = 0) (X = 1, Y = 1) Probability 0.1 0.2 0.4 0.3 Consider a binary classification problem where there is a single feature X ER and the depen- dent variable Y = {0, 1}. Let Px,y denote the joint distribution over pairs (X,Y), and let h: R {0, 1} denote a generic classifier. We define the error rate of h as R(h) := Pr(Y #h(X)), where the probability is computed from the joint distribution Px,y. Suppose that we collect data (x1, y₁),..., (xn, Yn), which are assumed to be independent and identically distributed from the distribution Px,y, and that we train a classifier ĥ using this data. Below we consider two precise specifications of the joint distribution Px,y. In both cases, derive the largest numerical value * that you can for which it holds that R(h) ≥ e. Carefully explain how you arrived at your specific value of ɛ in both cases. a) (10 points) X is restricted to the set {0,1} (i.e., is categorical) and Px,y is given by the distribution in Table 1. b) (10 points) The marginal distribution of Y is given by Pr(Y = 0) = 0.3 and Pr(Y = 1) = 0.7. Given that Y = 0, the distribution of X is normal with mean 5 and variance 2. Given that Y = 1, the distribution of X is normal with mean −3 and variance 2. (It is OK for the final answer to be written in terms of the CDF of a normal distribution and/or to numerically approximate this number up to 5 digits) Table 1 Outcome (X=0, Y=0) (X=0, Y = 1) (X= 1, Y = 0) (X = 1, Y = 1) Probability 0.1 0.2 0.4 0.3 Consider a binary classification problem where there is a single feature X ER and the depen- dent variable Y = {0, 1}. Let Px,y denote the joint distribution over pairs (X,Y), and let h: R {0, 1} denote a generic classifier. We define the error rate of h as R(h) := Pr(Y #h(X)), where the probability is computed from the joint distribution Px,y. Suppose that we collect data (x1, y₁),..., (xn, Yn), which are assumed to be independent and identically distributed from the distribution Px,y, and that we train a classifier ĥ using this data. Below we consider two precise specifications of the joint distribution Px,y. In both cases, derive the largest numerical value * that you can for which it holds that R(h) ≥ e. Carefully explain how you arrived at your specific value of ɛ in both cases. a) (10 points) X is restricted to the set {0,1} (i.e., is categorical) and Px,y is given by the distribution in Table 1. b) (10 points) The marginal distribution of Y is given by Pr(Y = 0) = 0.3 and Pr(Y = 1) = 0.7. Given that Y = 0, the distribution of X is normal with mean 5 and variance 2. Given that Y = 1, the distribution of X is normal with mean −3 and variance 2. (It is OK for the final answer to be written in terms of the CDF of a normal distribution and/or to numerically approximate this number up to 5 digits) Table 1 Outcome (X=0, Y=0) (X=0, Y = 1) (X= 1, Y = 0) (X = 1, Y = 1) Probability 0.1 0.2 0.4 0.3 Consider a binary classification problem where there is a single feature X ER and the depen- dent variable Y = {0, 1}. Let Px,y denote the joint distribution over pairs (X,Y), and let h: R {0, 1} denote a generic classifier. We define the error rate of h as R(h) := Pr(Y #h(X)), where the probability is computed from the joint distribution Px,y. Suppose that we collect data (x1, y₁),..., (xn, Yn), which are assumed to be independent and identically distributed from the distribution Px,y, and that we train a classifier ĥ using this data. Below we consider two precise specifications of the joint distribution Px,y. In both cases, derive the largest numerical value * that you can for which it holds that R(h) ≥ e. Carefully explain how you arrived at your specific value of ɛ in both cases. a) (10 points) X is restricted to the set {0,1} (i.e., is categorical) and Px,y is given by the distribution in Table 1. b) (10 points) The marginal distribution of Y is given by Pr(Y = 0) = 0.3 and Pr(Y = 1) = 0.7. Given that Y = 0, the distribution of X is normal with mean 5 and variance 2. Given that Y = 1, the distribution of X is normal with mean −3 and variance 2. (It is OK for the final answer to be written in terms of the CDF of a normal distribution and/or to numerically approximate this number up to 5 digits) Table 1 Outcome (X=0, Y=0) (X=0, Y = 1) (X= 1, Y = 0) (X = 1, Y = 1) Probability 0.1 0.2 0.4 0.3 Consider a binary classification problem where there is a single feature X ER and the depen- dent variable Y = {0, 1}. Let Px,y denote the joint distribution over pairs (X,Y), and let h: R {0, 1} denote a generic classifier. We define the error rate of h as R(h) := Pr(Y #h(X)), where the probability is computed from the joint distribution Px,y. Suppose that we collect data (x1, y₁),..., (xn, Yn), which are assumed to be independent and identically distributed from the distribution Px,y, and that we train a classifier ĥ using this data. Below we consider two precise specifications of the joint distribution Px,y. In both cases, derive the largest numerical value * that you can for which it holds that R(h) ≥ e. Carefully explain how you arrived at your specific value of ɛ in both cases. a) (10 points) X is restricted to the set {0,1} (i.e., is categorical) and Px,y is given by the distribution in Table 1. b) (10 points) The marginal distribution of Y is given by Pr(Y = 0) = 0.3 and Pr(Y = 1) = 0.7. Given that Y = 0, the distribution of X is normal with mean 5 and variance 2. Given that Y = 1, the distribution of X is normal with mean −3 and variance 2. (It is OK for the final answer to be written in terms of the CDF of a normal distribution and/or to numerically approximate this number up to 5 digits) Table 1 Outcome (X=0, Y=0) (X=0, Y = 1) (X= 1, Y = 0) (X = 1, Y = 1) Probability 0.1 0.2 0.4 0.3

Related Book For answer-question

Understandable Statistics Concepts And Methods

ISBN: 9781337119917

12th Edition

Authors: Charles Henry Brase, Corrinne Pellillo Brase

See More Books

Posted Date: Oct 03, 2023 02:39 AM

Consider a binary classification problem where there is a single feature X ER and the depen-...

Question:

Expert Answer:

To find the largest numerical value for which the error rate of the classifier is greater than we need to calculate the error rate under two different ... View the full answer

Understandable Statistics Concepts And Methods

Students also viewed these programming questions