The TPU uses fixed-point arithmetic (sometimes also called quantized arithmetic, with overlapping and conflicting definitions), where integers

Question:

The TPU uses fixed-point arithmetic (sometimes also called quantized arithmetic, with overlapping and conflicting definitions), where integers are used to represent values on the real number line. There are a number of different schemes for fixed-point arithmetic, but they share the common theme that there is an affine projection from the integer used by hardware to the real number that the integer represents. An affine projection has the form r=i*s+b, where i is the integer, r is the represented real value, and s and b are a scale and bias. You can of course write the projection in either direction, from integers to reals or vice versa (although you need to round when converting from reals to integers).

a. The simplest activation function supported by the TPU is “ReLUX,” which is a rectified linear unit with amaximumof X. For example, ReLU6 is defined by Relu6(x)={ 0, when x6 }. So 0.0 and 6.0 on the real number line are the minimum and maximum values that Relu6 might produce. Assume that you use an 8-bit unsigned integer in hardware, and that you want to make 0 map to 0.0 and 255 map to 6.0. Solve for s and b.

b. How many values on the real number line are exactly representable by an 8-bit quantized representation of ReLU6 output? What is the real-number spacing between them?

c. The difference between representable values is sometimes called a “unit in the least place,” or ulp, when performing numerical analysis. If you map a real number to its fixed-point representation, then map back, you only rarely get back the original real number. The difference between the original number and its representation is called the quantization error. When mapping a real number in the range [0.0,6.0] to an 8-bit integer, show that the worst-case quantization error is one-half of an ulp (make sure you round to the nearest representable value). You might consider graphing the errors as a function of the original real number.

d. Keep the real-number range [0.0,6.0] for an 8-bit integer from the last step.
What 8-bit unsigned integer represents 1.0? What is the quantization error for 1.0? Suppose that you ask the TPU to add 1.0 to 1.0. What answer do you get back, and what is the error in that result?

e. If you pick a random number uniformly in the range [0.0, 6.0], then quantize it to an 8-bit unsigned integer, what distribution would you expect to see for the 256 integer values?

f. The hyperbolic tangent function, tanh, is another commonly used activation function in deep learning: Tanh also has a bounded range, mapping the entire real number line to the interval (-1.0, 1.0). Solve for s and b for this range, using an 8-bit unsigned representation. Then solve for s and b using an 8-bit two's complement representation. For both cases, what real number does the integer 0 represent? Which integer represents the real number 0.0? Can you imagine any issues that might result from the quantiza-tion error incurred when representing 0.0?