The purpose of this exercise is to derive the dual program (7.21) from the primal program (7.20).

Question:

The purpose of this exercise is to derive the dual program (7.21) from the primal program (7.20). The starting point is to introduce a vector of auxiliary variables \(\xi:=\left[\xi_{1}, \ldots, \xi_{n}\right]^{\top}\) and write the primal program as

\[ \begin{equation*} \min _{\boldsymbol{\alpha}, \alpha_{0}, \boldsymbol{\xi}} \sum_{i=1}^{n} \boldsymbol{\xi}_{i}+\frac{\gamma}{2} \boldsymbol{\alpha}^{\top} \mathbf{K} \boldsymbol{\alpha} \tag{7.30} \end{equation*} \]

subject to: \(\boldsymbol{\xi} \geqslant \mathbf{0}\)

\[ y_{i}\left(\alpha_{0}+\{\boldsymbol{K} \boldsymbol{\alpha}\}_{i}\right) \geqslant 1-\xi_{i}, \quad i=1, \ldots, n \]

(a) Apply the Lagrangian optimization theory from Section B.2.2 to obtain the Lag-rangian function \(\mathscr{L}\left(\left\{\alpha_{0}, \boldsymbol{\alpha}, \boldsymbol{\xi}\right\},\{\lambda, \boldsymbol{\mu}\}\right)\), where \(\boldsymbol{\mu}\) and \(\lambda\) are the Lagrange multipliers corresponding to the first and second inequality constraints, respectively.

406

(b) Show that the Karush-Kuhn-Tucker (see Theorem B.2) conditions for optimizing L are:

\[ \begin{gather*} \boldsymbol{\lambda}^{\top} \boldsymbol{y}=0 \tag{7.31}\\ \boldsymbol{\alpha}=y \odot \boldsymbol{\lambda} / \gamma \\ \mathbf{0} \leqslant \boldsymbol{\lambda} \leqslant \mathbf{1} \\ (\mathbf{1}-\boldsymbol{\lambda}) \odot \boldsymbol{\xi}=\mathbf{0}, \quad \lambda_{i}\left(y_{i} g\left(\boldsymbol{x}_{i}\right)-1+\xi_{i}\right)=0, i=1, \ldots, n \\ \boldsymbol{\xi} \geqslant \mathbf{0}, y_{i} g\left(\boldsymbol{x}_{i}\right)-1+\xi_{i} \geqslant 0, i=1, \ldots, n \end{gather*} \]

407 Here \(\odot\) stands for componentwise multiplication; e.g., \(y \odot \lambda=\left[y_{1} \lambda_{1}, \ldots, y_{n} \lambda_{n}\right]^{\top}\), and we have abbreviated \(\alpha_{0}+\{\mathbf{K} \boldsymbol{\alpha}\}_{i}\) to \(g\left(\boldsymbol{x}_{i}\right)\), in view of (7.19). [Hint: one of the KKT conditions is \(\lambda=\mathbf{1}-\boldsymbol{\mu}\); thus we can eliminate \(\boldsymbol{\mu}\).]

(c) Using the KKT conditions (7.31), reduce the Lagrange dual function \(\left.\mathscr{L}^{*}(\lambda):=\min _{\alpha_{0}, \alpha, \xi} \mathscr{L}\left(\left\{\alpha_{0}, \alpha, \boldsymbol{\xi}\right\}\right),\{\lambda, 1-\lambda\}\right)\) to

\[ \mathscr{L}^{*}(\boldsymbol{\lambda})=\sum_{i=1}^{n} \lambda_{i}-\frac{1}{2 \gamma} \sum_{i=1}^{n} \sum_{j=1}^{n} \lambda_{i} \lambda_{j} y_{i} y_{j} \kappa\left(\boldsymbol{x}, \boldsymbol{x}_{j}\right) \]

(d) As a consequence of (7.19) and (a)-(c), show that the optimal prediction function \(g_{\tau}\) is given by \[ \begin{equation*} g_{\tau}(\boldsymbol{x})=\alpha_{0}+\frac{1}{\gamma} \sum_{i=1}^{n} y_{i} \lambda_{i} \kappa\left(\boldsymbol{x}_{i}, \boldsymbol{x}\right) \tag{7.33} \end{equation*} \]
where \(\lambda\) is the solution to \[ \begin{align*} \max _{\boldsymbol{\lambda}} & \mathscr{L}^{*}(\boldsymbol{\lambda}) \tag{7.34}\\ \text { subject to: } & \quad \boldsymbol{\lambda}^{\top} \boldsymbol{y}=0, \mathbf{0} \leqslant \boldsymbol{\lambda} \leqslant \mathbf{1} \end{align*} \]
and \(\alpha_{0}=y_{j}-\frac{1}{\gamma} \sum_{i=1}^{n} y_{i} \lambda_{i} \kappa\left(\boldsymbol{x}_{i}, \boldsymbol{x}_{j}\right)\) for any \(j\) such that \(\lambda_{j} \in(0,1)\).

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Related Book For  book-img-for-question

Data Science And Machine Learning Mathematical And Statistical Methods

ISBN: 9781118710852

1st Edition

Authors: Dirk P. Kroese, Thomas Taimre, Radislav Vaisman, Zdravko Botev

Question Posted: