3 Learning DNFs with kernel perceptron Suppose that we have S ( x ( i ) , y ( i ) ) i 1 n with x ( i ) i n 0 , 1 d and y ( i ) i n 1 , 1 Let 0 , 1 d 0 , 1 be a target function which labels the points Additionally assume that is a DNF formula ( i e is a disjunction of conjunctions, or a boolean or of a bunch of boolean and s ) The fact that it labels the points simply means that 1 y ( i ) 1 ( x ( i ) ) For example, let ( x ) ( x 1 x 2 ) v v ( x 1 x 2 x 3 ) ( where x i denotes the i th entry of x ) , x ( i ) ( 1 , 0 , 1 ) T T , and x ( j ) ( 1 , 0 , 0 ) T T Then, we would have ( x ( i ) ) 1 and ( x ( j ) ) 0 , and thus y ( i ) 1 and y ( j ) 1 ( i ) Give an example target function ( make sure its a DNF formula ) and set S such that the data is not linearly separable Part ( i ) clearly shows that running the perceptron algorithm on S cannot work in general since the data does not need to be linearly separable However, we can try to use a feature transformation and the kernel trick to linearize the data and thus run the kernelized version of the perceptron algorithm on these datasets Consider the feature transformation 0 , 1 d 0 , 1 3 d which maps a vector x to the vector of all the conjunc tions of its entries or of their negations So for example if d 2 then ( note that 1 can be viewed as the empty conjunction, i e the conjunction of zero literals ) Let K 0 , 1 d 0 , 1 d R be the kernel function associated with ( i e for a , bin 0 , 1 d K ( a , b ) ( a ) ( b ) Note that the naive approach of calculating K ( a , b ) ( simply calculating ( a ) and ( b ) and taking the dot product ) takes time ( 3 d ) Also let w i n 0 , 1 3 d be such that w 1 0 5 ( this is the entry which corresponds to the empty conjunction, i e AAxin 0 , 1 d ( x ) 1 1 ) and AAi 1 w i 1 iff the i th conjunction is one of the conjunctions of So for example letting ( x ) ( x 1 x 2 ) v v ( b a r ( x 1 ) ) we would have w ( 0 5 , 0 , 0 , 1 , 0 , 1 , 0 , 0 , 0 ) T T ( ii ) Find a way to compute K ( a , b ) in O ( d ) time ( iii ) Show that w linearly separates is just a shorthand for ( ( x ( i ) ) , y ( i ) ) i 1 n ) and find a lower bound for the margin with which it separates the data Remember that m i n ( ( x ( i ) ) , y ( i ) ) i n ( S ) y i ( w w ( x ( i ) ) ) Your lower bound should depend on s , the number of conjunctions in ( iv ) Find an upper bound on the radius R of the dataset ( S ) Remember that R m a x ( ( x ( i ) ) , y ( i ) ) i n ( S ) ( x ( i ) ) ( v ) Use parts ( ii ) , ( iii ) , and ( iv ) to show that we can run kernel perceptron efficiently on this transformed space in which our data is linearly separable ( show that each iteration only takes O ( n d ) time per point ) but that unfortunately the mistake bound is very bad ( show that it is O ( s 2 d ) ) There are ways to get a better mistake bound in this same kernel space, but the running time then becomes very bad ( exponential ) It is open whether there are ways to get both polynomial mistake bound and running time

The Answer is in the image, click to view ...

Question: 3 Learning DNFs with kernel perceptron Suppose that we have S = { ( x ( i ) , y ( i ) ) }

3

Learning DNFs with kernel perceptron

Suppose that we have

S = {(x^{(i)}, y^{(i)})}_{i} = 1^{n}

with

x^{(i)} i n {0, 1}^{d}

and

y^{(i)} i n {- 1, 1} .

Let

{0, 1}^{d} {0, 1}

be a

"target function" which "labels" the points. Additionally assume that

is a DNF formula

(

.

.

is a disjunction of

conjunctions, or a boolean

"

"

of a bunch of boolean "and"s

) .

The fact that it "labels" the points simply means that

1 [y^{(i)} = 1] = (x^{(i)}) .

For example, let

(x) = (x_{1}^{?^{?}} x_{2}) v v (x_{1}^{?^{?}} {\overset{}{x}}_{2}^{?^{?}} x_{3}) (

where

x_{i}

denotes the

i

th entry of

{

x), x^{(i)} = ([1, 0, 1])^{T T},

and

x^{(j)} = ([1, 0, 0])^{T T} .

Then, we would have

(x^{(i)}) = 1

and

(x^{(j)}) = 0,

and thus

y^{(i)} = 1

and

y^{(j)} = - 1 .

(

)

Give an example target function

(

make sure its a DNF formula

)

and set

S

such that the data is not linearly

separable.

Part

(

)

clearly shows that running the perceptron algorithm on

S

cannot work in general since the data does not

need to be linearly separable. However, we can try to use a feature transformation and the kernel trick to linearize the

data and thus run the kernelized version of the perceptron algorithm on these datasets.

Consider the feature transformation

{0, 1}^{d} {0, 1}^{3^{d}}

which maps a vector

x

to the vector of all the conjunc

-

tions of its entries or of their negations. So for example if

d = 2

then

(

note that

1

can be viewed as the empty conjunction, i

.

.

the conjunction of zero literals

) .

Let

K

{0, 1}^{d} {0, 1}^{d} R

be the kernel function associated with

(

.

.

for

a,

bin

{0, 1}^{d}

K (a, b) =

(a) * (b) .

Note that the naive approach of calculating

K (a, b) (

simply calculating

(a)

and

(b)

and taking the dot

product

)

takes time

(3^{d}) .

Also let

w^{* *} i n {0, 1}^{3^{d}}

be such that

w_{1}^{* *} = - 0.5 (

this is the entry which corresponds to the empty conjunction, i

.

.

{

:AAxin

{0, 1}^{d}

(x)_{1} = 1)

and AAi

> 1

w_{i}^{* *} = 1

iff the

i

th conjunction is one of the conjunctions of

.

So for example

letting

(x) = (x_{1}^{?^{?}} x_{2}) v v (\frac{?}{b a r} (x_{1}))

we would have:

w^{* *} = ([- 0.5, 0, 0, 1, 0, 1, 0, 0, 0])^{T T}

(

)

Find a way to compute

K (a, b)

O (d)

time.

(

iii

)

Show that

w^{* *}

linearly separates is just a shorthand for

{

{((x^{(i)}), y^{(i)})}_{i} = 1^{n})

and find a lower bound

for the margin

with which it separates the data. Remember that

= m i n_{((x^{(i)}), y^{(i)}) i n (S)} y_{i} (\frac{w^{* *}}{| | w^{* *} | |} * (x^{(i)})) .

Your lower bound should depend on

s,

the number of conjunctions in

.

(

)

Find an upper bound on the radius

R

of the dataset

(S) .

Remember that

R = m a x_{((x^{(i)}), y^{(i)}) i n (S)} | | (x^{(i)}) | | .

(

)

Use parts

(

), (

iii

),

and

(

)

to show that we can run kernel perceptron efficiently on this transformed space

in which our data is linearly separable

(

show that each iteration only takes

O (n d)

time per point

)

but that

unfortunately the mistake bound is very bad

(

show that it is

O (s 2^{d})) .

There are ways to get a better mistake bound in this same kernel space, but the running time then becomes very

bad

(

exponential

) .

It is open whether there are ways to get both polynomial mistake bound and running time.

3 Learning DNFs with kernel perceptron Suppose that we have S={(x(i),y(i))}i=1n

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

3 Learning DNFs with kernel perceptron Suppose that we have s = {(10,y)}, with 20 {0,134 and y() {-1,1). Let y: {0,1} + {0,1} be a "target function" which "labels" the points. Additionally assume...

References: Health Information Management Case Studies Second Edition AHIMA by Dianna M. Foley Health Information Management Technology an Applied Approah Sixth Edition AHIMA by Nanette Sayles and...

It is okay if you only take into account the first two screenshots. return sum(i[O] * i[1] for i in zip(x,y)) ####### # Part 2c - Extract Image Segment def extract_image_segment(img, width, height,...

PLEASE DO ALL Q 1.1 Convolutional Neural Netoworks (a) Given an input image of dimension 10 x 11, what will be output dimension after applying a convolution with 3 x 3 kernel, stride of 2, and no...

6.2 ( ) In this exercise, we develop a dual formulation of the perceptron learning algorithm. Using the perceptron learning rule (4.55), show that the learned weight vector w can be written as a...

Suppose you are given the following binary classification training data, where each input example has three features and output label takes a value good or bad. Now suppose we want to learn a...

- Please DO NOT use any AI tools like Chat-GPT to generate codes/answers as that'd a violation. 1. (SAT-solving) a. In the DPLL strategy, certain variables are assigned values before others are...

PLEASE HELP IN PYTHON. A perceptron is a binary linear classifier used in supervised learning which helps to classify the given input data. According to perceptron learning, the algorithm...

6. Kernel Functions (50 pts, page 8 & 9) (a) [10 pts, page 8] If k1(x, ) and k2(x, z) are valid kernels and a, > 0, show that k3(x,z) = aki(x, z) + Bk2(x, z) is also a valid kernel. (b) [10 pts, page...

Please provide the summary of the methodology and your understanding of this paper. Incluse necessary figures as well. Rapid Object Detection using a Boosted Cascade of Simple Features single feature...

1. Use induction to prove that 1 3+2+3+ +n = n (n + 1) /4 for all integers n 1. 2. Use induction to prove that the following identity holds for all integers n 1:1+3+5++ (2n-1) = n 2. Exercises 3.5...

If c is a normalized wave function, what are its SI units for? (a) The one-particle, one-dimensional case; (b) The one-particle, three-dimensional case; (c) The n-particle, three-dimensional case.

When the nearby futures contracts are trading higher than the deferred futures contract. bullinh market bearish market classic market inverted market

Cost of Equity: Dividend Growth Summerdahl Resort's common stock is currently trading at $39 a share. The stock is expected to pay a dividend of $2.50 a share at the end of the year the cost of...

Explain the importance of competitive labor market and product market forces in compensation decisions. page 470

Describe the fundamental pay programs for recognizing employees contributions to the organizations success. page 503

Specify the relationship between job satisfaction and various forms of job withdrawal, and identify the major sources of job satisfaction in work contexts. page 437