Projection pursuit is a network with one hidden layer that can be written as g( boldsymbol x ) S left( boldsymbol omega top boldsymbol x ight) where (S ) is a univariate smoothing cubic spline If we use squared error loss with ( tau n left y i , boldsymbol x i ight i 1 n ), we need to minimize the ...

The Answer is in the image, click to view ...

Projection pursuit is a network with one hidden layer that can be written as: [ g(boldsymbol{x})=Sleft(boldsymbol{omega}^{top} boldsymbol{x}

Question:

Projection pursuit is a network with one hidden layer that can be written as:

\[ g(\boldsymbol{x})=S\left(\boldsymbol{\omega}^{\top} \boldsymbol{x}\right) \]

where \(S\) is a univariate smoothing cubic spline. If we use squared-error loss with \(\tau_{n}=\left\{y_{i}, \boldsymbol{x}_{i}\right\}_{i=1}^{n}\), we need to minimize the training loss:
\[ \begin{equation*} \frac{1}{n} \sum_{i=1}^{n}\left(y_{i}-S\left(\boldsymbol{\omega}^{\top} \boldsymbol{x}_{i}\right)\right)^{2} \tag{235} \end{equation*} \]
with respect to \(\omega\) and all cubic smoothing splines. This training of the network is typically tackled iteratively in a manner similar to the EM algorithm. In particular, we iterate \((t=1,2, \ldots)\) the following steps until convergence.

(a) Given the missing data \(\omega_{t}\), compute the spline \(S_{t}\) by training a cubic smoothing spline on \(\left\{y_{i}, \boldsymbol{\omega}_{t}^{\top} \boldsymbol{x}_{i}\right\}\). The smoothing coefficient of the spline may be determined as part of this step.

(b) Given the spline function \(S_{t}\), compute the next projection vector via iterative reweighted least squares:
\[ \begin{equation*} \boldsymbol{\omega}_{t+1}=\underset{\boldsymbol{\beta}}{\arg \min }\left(\boldsymbol{e}_{t}-\mathbf{X} \boldsymbol{\beta}\right)^{\top} \sum_{t}\left(\boldsymbol{e}_{t}-\mathbf{X} \boldsymbol{\beta}\right) \tag{9.11} \end{equation*} \]

where \[ e_{t, i}:=\boldsymbol{\omega}_{t}^{\top} \boldsymbol{x}_{i}+\frac{y_{i}-S_{t}\left(\boldsymbol{\omega}_{t}^{\top} \boldsymbol{x}_{i}\right)}{S^{\prime}\left(\boldsymbol{\omega}_{t}^{\top} \boldsymbol{x}_{i}\right)}, \quad i=1, \ldots, n \]
is the adjusted response, and \(\boldsymbol{\Sigma}_{t}^{1 / 2}=\operatorname{diag}\left(S^{\prime}{ }_{t}\left(\boldsymbol{\omega}_{t}^{\top} \boldsymbol{x}_{1}\right), \ldots, S^{\prime}{ }_{t}\left(\boldsymbol{\omega}_{t}^{\top} \boldsymbol{x}_{n}\right)\right)\) is a diagonal matrix.
Apply Taylor's Theorem B. 1 to the function \(S_{t}\) and derive the iterative reweighted least-squares optimization program (9.11).

Fantastic news! We've Found the answer you've been seeking!