Question: Neural Nets 1. (2 pts) Why should one use bipolar encodings of binary values instead of standard binary encoding? 2. (2 pts) Why is momentum
Neural Nets
1. (2 pts) Why should one use bipolar encodings of binary values instead of standard binary encoding? 2. (2 pts) Why is momentum often used in combination with stochastic gradient descent (SGD)? 3. (2 pts) Why does SGD have trouble converging when the gradient has small magnitude values? 4. (2 pts) Why does SGD have trouble converging when the gradient has large magnitude values? 5. (2 pts) What can happen if the learning rate is set too high? 6. (2 pts) What can happen if the learning rate is set too low? 7. (2 pts) What is the name of the matrix of second order partial derivatives of the error function? 8. (2 pts) Why is the matrix of second order partial derivatives not commonly used to help train neural networks? 9. (2 pts) Why is L-BFGS not commonly used for optimization of error functions with neural networks? 10. (2 pts) What is the name of one non-linear activation function which helps prevent gradients from approaching zero magnitude in deep neural networks
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
