Question: Show that the gradient of a loss function $L ( vu , gamma , beta ) $ with respect to $

Show that the gradient of a loss function $L(\vu,\gamma,\beta)$ with respect to $\vu$ can be written in the form $
abla_\vu L=s\mW^\perp
abla_\vw L$ for some $s$, where $\mW^\perp=\left(\mI-\frac{\vu\vu^\top}{||\vu||^2}\right)$.
Note that \footnote{As a side note: $\mW^\perp$ is an orthogonal complement that projects the gradient away from the direction of $\vw$, which is usually (empirically) close to a dominant eigenvector of the covariance of the gradient. This helps to condition the landscape of the objective that we want to optimize.} $\mW^\perp\vu=\mathbf{0}$.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!