In object recognition, translating an image by a few pixels in some direction should not affect the

Fantastic news! We've Found the answer you've been seeking!

Question:

In object recognition, translating an image by a few pixels in some direction should not affect the category recognized. Suppose that we consider images with an object in the foreground on top of a uniform background. Suppose also that the objects of interest are always at least 10 pixels away from the borders of the image. Are the following neural networks invariant to translations of at most 10 pixels in some direction?

Here the translation is applied only to the foreground object while keeping the background fixed. If your answer is yes, show that the neural network will necessarily produce the same output for two images where the foreground object is translated by at most 10 pixels. If your answer is no, provide a counter example by describing a situation where the output of the neural network is different for two images where the foreground object is translated by at most 10 pixels.

(a) Neural network with one hidden layer consisting of convolutions (5 x 5 patches with a stride of 1 in each direction) and a softmax output layer.

(b) Neural network with two hidden layers consisting of convolutions (5 x 5 patches with a stride of 1 in each direction) followed by max pooling (4 x 4 patches with a stride of 4 in each direction) and a softmax output layer.