In this problem we will show that the existence of an efficient mistake-bounded learner for a class

No answer yet for this question. Ask a Tutor

Question:

In this problem we will show that the existence of an efficient mistake-bounded learner for a class C implies an efficient PAC learner for C. Concretely, let C be a function class with domain X {1, 1} n and binary labels Y {1, 1}. Assume that C can be learned by algorithm/learner A with some mistake bound t. You may assume you know the value t. You may also assume that at each iteration, A runs in time polynomial in n and that A only updates its state when it gets an example wrong. The concrete goal of this problem is to create a PAC-learning algorithm, B, that can PAClearn concept class C with respect to an arbitrary distribution D over {1, 1} n using algorithm A as a sub-routine. In order to prove that learner B can PAC-learn concept class C, we must show that there exists a finite number of examples, m, that we can draw from D such that B produces a hypothesis whose true error is more than with probability at most . First, fix some distribution D on X, and we will assume that the examples are labeled by an unknown c C. Additionally, for a hypothesis (i.e. function) h : X Y , let err(h) = PxD[h(x) = c(x)]. Formally, we will need to bound m such that the following condition holds: , [0, 1], m N | PxD[err(B({x} m)) > ] x D (1) where B({x} m) denotes a hypotheses produced from B with m random draws of x from an arbitrary distribution D. To find this m, we will first decompose it into blocks of examples of size k and make use of results based on a single block to find the bound necessary for m that satisfies condition 1. Note: Using the identity P[err(h) > ]+P[err(h) ] = 1, we can see that P[err(h) > ] P[err(h) ] 1 , which makes the connection to the definition of PAC-learning discussed in lecture explicit.

(a) Fix a single arbitrary hypothesis h : X Y produced by A and determine a lower bound on the number of examples, k, such that P[err(h ) > ] . (The contrapositive view would be: with probability at least 1 , it must be the case that err(h ) . Make sure this makes sense.)

(b) From part 5a we know that as long as a block is at least of size k, then if that block is classified correctly by a fixed arbitrary hypothesis h we can effectively upper bound the probility of the 'bad event' (i.e. A outputs h s.t. err(h ) > ) by . However, our bound must apply to every h that our algorith B could output for an arbitrary distribution D over examples. With this in mind, how large should m be so that we can bound all hypotheses that could be output? (You may assue that algorithm B will know the mistake bound throughout the question.)

(c) Put everything together and fully describe (with proof) a PAC learner that is able to output a hypothesis with a true error at most with probability at least 1 , given a mistake bounded learner A. To do this you should first describe your pseudocode for algorithm B which will use A as a sub-routine (no need for minute details or code, broad 2 strokes will suffice). Then, prove there exists a finite number of m examples for B to PAC-learn C for all values of and by lower bounding m by a function of , , and t (i.e. finding a finite lower bound for m such that the PAC-learning requirements in 1 are satisfied).