The purpose of this exercise is to prove the following Vapnik-Chernovenkis bound: for any finite class (mathscr{G})

Question:

The purpose of this exercise is to prove the following Vapnik-Chernovenkis bound: for any finite class \(\mathscr{G}\) (containing only a finite number \(|\mathscr{G}|\) of possible functions) and a general bounded loss function, \(l \leqslant\) Loss \(\leqslant u\), the expected statistical error is bounded from above according to:

\[ \begin{equation*} \mathbb{E} \ell\left(g_{\mathscr{T}_{n}}^{\mathscr{G}}\right)-\ell\left(g^{\mathscr{G}}\right) \leqslant \frac{(u-l) \sqrt{2 \ln (2 \mid \mathscr{G})}}{\sqrt{n}} \tag{2.57} \end{equation*} \]

Note how this bound conveniently does not depend on the distribution of the training set \(\mathscr{T}_{n}\) (which is typically unknown), but only on the complexity (i.e., cardinality) of the class \(\mathscr{G}\). We can break up the proof of (2.57) into the following four parts:

(a) For a general function class \(\mathscr{G}\), training set \(\mathscr{T}\), risk function \(\ell\), and training loss \(\ell_{\mathscr{T}}\), we have, by definition, \(\ell\left(g^{\mathscr{G}}\right) \leqslant \ell(g)\) and \(\ell\left(g_{\mathscr{T}}^{\mathscr{G}}\right) \leqslant \ell_{\mathscr{T}}(g)\) for all \(g \in \mathscr{G}\). Show that \[ \ell\left(g_{\mathscr{T}_{n}}^{\mathscr{C}}\right)-\ell\left(g^{\mathscr{G}}\right) \leqslant \sup _{g \in \mathscr{G}}\left|\ell_{\mathscr{T}}(g)-\ell(g)\right|+\ell_{\mathscr{T}}\left(g^{\mathscr{G}}\right)-\ell\left(g^{\mathscr{G}}\right) \]
where we used the notation sup (supremum) for the least upper bound. Since \(\mathbb{E} \ell_{\mathscr{T}}(g)=\mathbb{E} \ell(g)\), we obtain, after taking expectations on both sides of the inequality above:
\[ \mathbb{E} \ell\left(g_{\mathscr{T}}^{\mathscr{G}}\right)-\ell\left(g^{\mathscr{G}}\right) \leqslant \operatorname{Esup}_{g \in \mathscr{G}}|\ell \mathscr{T}(g)-\ell(g)| \]

(b) If \(X\) is a zero-mean random variable taking values in the interval \([l, u]\), then the following Hoeffding's inequality states that the moment generating function satisfies \[ \begin{equation*} \mathbb{E} \mathrm{e}^{t X} \leqslant \exp \left(\frac{t^{2}(u-l)^{2}}{8}\right), \quad t \in \mathbb{R} \tag{2.58} \end{equation*} \]

Prove this result by using the fact that the line segment joining points \((l, \exp (t l))\) and ( \(u, \exp (t u)\) ) bounds the convex function \(x \mapsto \exp (t x)\) for \(x \in[l, u]\); that is:
\[ \mathrm{e}^{t x} \leqslant \mathrm{e}^{t l} \frac{u-x}{u-l}+\mathrm{e}^{t u} \frac{x-l}{u-l}, \quad x \in[l, u] \]

(c) Let \(Z_{1}, \ldots, Z_{n}\) be (possibly dependent and non-identically distributed) zero-mean random variables with moment generating functions that satisfy \(\mathbb{E} \exp \left(t Z_{k}\right) \leqslant \exp \left(t^{2} \eta^{2} / 2\right)\) for all \(k\) and some parameter \(\eta\). Use Jensen's inequality (2.56) to prove that for any \(t>0\), \[ \mathbb{E} \max _{k} Z_{k}=\frac{1}{t} \mathbb{E} \ln \max _{k} \mathrm{e}^{t Z_{k}} \leqslant \frac{1}{t} \ln n+\frac{t \eta^{2}}{2} \]
From this derive that \[ \mathbb{E} \max _{k} Z_{k} \leqslant \eta \sqrt{2 \ln n} \]
Finally, show that this last inequality implies that \[ \begin{equation*} \mathbb{E} \max _{k}\left|Z_{k}\right| \leqslant \eta \sqrt{2 \ln (2 n)} \tag{2.59} \end{equation*} \]

(d) Returning to the objective of this exercise, denote the elements of \(\mathscr{G}\) by \(g_{1}, \ldots, g_{|\mathscr{G}|}\), and let \(Z_{k}=\ell_{\mathscr{T}_{n}}\left(g_{k}\right)-\left(g_{k}\right)\). By part

(a) it is sufficient to bound \(\mathbb{E} \max _{k}\left|Z_{k}\right|\). Show that the \(\left\{Z_{k}\right\}\) satisfy the conditions of

(c) with \(\eta=(u-l) / \sqrt{n}\). For this you will need to apply part

(b) to the random variable \(\operatorname{Loss}(g(\boldsymbol{X}), Y)-\ell(g)\), where \((\boldsymbol{X}, Y)\) is a generic data point. Now complete the proof of (2.57).

Fantastic news! We've Found the answer you've been seeking!