Question: Show that the gradient of a loss function $L ( vu , gamma , beta ) $ with respect to $
Show that the gradient of a loss function $Lvugammabeta$ with respect to $vu$ can be written in the form $
ablavu LsmWperp
ablavw L$ for some $s$ where $mWperpleftmIfracvuvutopvuright$
Note that footnoteAs a side note: $mWperp$ is an orthogonal complement that projects the gradient away from the direction of $vw$ which is usually empirically close to a dominant eigenvector of the covariance of the gradient. This helps to condition the landscape of the objective that we want to optimize. $mWperpvumathbf$
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
