Question: 3 Learning DNFs with kernel perceptron Suppose that we have S = { ( x ( i ) , y ( i ) ) }

3 Learning DNFs with kernel perceptron
Suppose that we have S={(x(i),y(i))}i=1n with x(i)in{0,1}d and y(i)in{-1,1}. Let :{0,1}d{0,1} be a
"target function" which "labels" the points. Additionally assume that is a DNF formula (i.e. is a disjunction of
conjunctions, or a boolean "or" of a bunch of boolean "and"s). The fact that it "labels" the points simply means that
1[y(i)=1]=(x(i)).
For example, let (x)=(x1??x2)vv(x1??x2??x3)(where xi denotes the i th entry of {:x),x(i)=([1,0,1])TT,
and x(j)=([1,0,0])TT. Then, we would have (x(i))=1 and (x(j))=0, and thus y(i)=1 and y(j)=-1.
(i) Give an example target function (make sure its a DNF formula) and set S such that the data is not linearly
separable.
Part (i) clearly shows that running the perceptron algorithm on S cannot work in general since the data does not
need to be linearly separable. However, we can try to use a feature transformation and the kernel trick to linearize the
data and thus run the kernelized version of the perceptron algorithm on these datasets.
Consider the feature transformation :{0,1}d{0,1}3d which maps a vector x to the vector of all the conjunc-
tions of its entries or of their negations. So for example if d=2 then
(note that 1 can be viewed as the empty conjunction, i.e. the conjunction of zero literals).
Let K:{0,1}d{0,1}dR be the kernel function associated with (i.e. for a,bin{0,1}d:K(a,b)=
(a)*(b). Note that the naive approach of calculating K(a,b)(simply calculating (a) and (b) and taking the dot
product) takes time (3d).
Also let w**in{0,1}3d be such that w1**=-0.5(this is the entry which corresponds to the empty conjunction, i.e.
{:AAxin{0,1}d:(x)1=1) and AAi>1:wi**=1 iff the i th conjunction is one of the conjunctions of . So for example
letting (x)=(x1??x2)vv(?bar(x1)) we would have:
w**=([-0.5,0,0,1,0,1,0,0,0])TT
(ii) Find a way to compute K(a,b) in O(d) time.
(iii) Show that w** linearly separates is just a shorthand for {:{((x(i)),y(i))}i=1n) and find a lower bound
for the margin with which it separates the data. Remember that =min((x(i)),y(i))in(S)yi(w**||w**||*(x(i))).
Your lower bound should depend on s, the number of conjunctions in .
(iv) Find an upper bound on the radius R of the dataset (S). Remember that
R=max((x(i)),y(i))in(S)||(x(i))||.
(v) Use parts (ii),(iii), and (iv) to show that we can run kernel perceptron efficiently on this transformed space
in which our data is linearly separable (show that each iteration only takes O(nd) time per point) but that
unfortunately the mistake bound is very bad (show that it is O(s2d)).
There are ways to get a better mistake bound in this same kernel space, but the running time then becomes very
bad (exponential). It is open whether there are ways to get both polynomial mistake bound and running time.
 3 Learning DNFs with kernel perceptron Suppose that we have S={(x(i),y(i))}i=1n

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!