Question: We have so far been concerned with predicting numerical quantities, like salaries. Now suppose we want to predict which dining option on campus a UCSD

 We have so far been concerned with predicting numerical quantities, like
salaries. Now suppose we want to predict which dining option on campus
a UCSD student will select for lunch (e.g., Burger King, Tahini, FanFan,

We have so far been concerned with predicting numerical quantities, like salaries. Now suppose we want to predict which dining option on campus a UCSD student will select for lunch (e.g., Burger King, Tahini, FanFan, Dirty Birds, Taco Villa, etc.). Predicting a discrete category (as opposed to a number) is an important machine learning task called classification. We can use empirical risk minimization to make classifications, too. Suppose we have gathered a data set of n lunch preferences of students over the past week: Burger King Dirty Birds Burger King Tahini Taco Villa Burger King Dirty Birds Taco Villa The first step is to choose a number which will uniquely represent each option. For instance: Dirty Birds 1 Burger King 2 Tahini 3 Fan-Fan 4 Taco Villa 5 We then map each instance of a dining option to its corresponding number, giving us a new data set of numbers. For instance, the data above becomes: We then map each instance of a dining option to its corresponding number, giving us a new data set of numbers. For instance, the data above becomes: 21235215 a) 6. Now that we have converted the data to a list of numbers, we can make a prediction by minimizing the mean absolute loss. Explain why this is not a good idea. b) 66 Let's use the zero-one loss, which is defined as follows: L01(h,y)={0,1,h=yh=y As usual, define the risk to be the average loss: R01(h)=n1i=1nL01(h,y) Notice that R01(h) can be interpreted as the misclassification rate. That is, if R01(h)=.7, then predicting h would result in the wrong answer for 70% of the data points. Given the data set {4,2,4,1,3,4,4,3,2,5}, plot the empirical risk R01(h) for h[0,5]. Hint: the function should have point discontinuities. c) 6. Is gradient descent useful for minimizing the risk with zero-one loss? Why or why not? Make reference to your plot of the risk in your answer. Hint: the risk is indeed non-convex, but gradient descent can still be useful for minimizing non-convex functions. Is there some other reason

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!