Question: 6 Risk Minimization with Doubt Suppose we have a classification problem with classes labeled 1 , dots,c and an additional doubt category labeled c +

6 Risk Minimization with Doubt
Suppose we have a classification problem with classes labeled 1,dots,c and an additional "doubt"
category labeled c+1. Let f:R^(d)->{1,dots,c+1} be a decision rule. Define the loss function
L(f(x),y)={(0 if f(x)=y,f(x)in{1,dots,c},),(\lambda _(c) if f(x)!=y,f(x)in{1,dots,c}),(\lambda _(d) if f(x)=c+1):}
where \lambda _(c)>=0 is the loss incurred for making a misclassification and \lambda _(d)>=0 is the loss incurred for
choosing doubt. In words this means the following:
When you are correct, you should incur no loss.
When you are incorrect, you should incur some penalty \lambda _(c) for making the wrong choice.
When you are unsure about what to choose, you might want to select a category correspond-
ing to "doubt" and you should incur a penalty \lambda _(d).
In lecture, you saw a definition of risk over the expectation of data points. We can also define the
risk of classifying a new individual data point x as class f(x)in{1,2,dots,c+1} :
R(f(x)|x)=\sum_(i=1)^c L(f(x),i)P(Y=i|x)
(a) First, we will simplify the risk function using our specific loss function separately for when
f(x) is or is not the doubt category.
i. Prove that R(f(x)=i|x)=\lambda _(c)(1-P(Y=i|x)) when ii!=c+1.
ii. Prove that R(f(x)=c+1|x)=\lambda _(d).
(b) Show that the following policy f_(opt )(x) obtains the minimum risk:
(R1) Find the non-doubt class i such that P(Y=i|x)>=P(Y=j|x) for all j, meaning
you pick the class with the highest probability given x .
(R2) Choose class i if P(Y=i|x)>=1-(\lambda _(d))/(\lambda _(c))
(R3) Choose doubt otherwise.
Hint: In order to prove that f_(opt )(x) minimizes risk, consider proof techniques that show that
f_(opt )(x) "stays ahead" of all other policies that don't follow these rules. For example, you could
take a proof-by-contradiction approach: assume there exists some other policy, say f^(')(x), that
minimizes risk more than f_(opt )(x). What are the scenarios where the predictions made by f_(opt )(x)
and f^(')(x) might differ? In these scenarios, and based on the rules above that f_(opt )(x) follows,
why would f^(')(x) not be able to beat f_(opt )(x) in risk minimization?
(c) How would you modify your optimum decision rule if \lambda _(d)=0? What happens if \lambda _(d)>\lambda _(c)?
Explain why this is or is not consistent with what one would expect intuitively.
6 Risk Minimization with Doubt Suppose we have a

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!