In Exercise 11 we viewed the multivariate Student (mathbf{t}_{alpha}) distribution as a scale-mixture of the (mathscr{N}left(mathbf{0}, mathbf{I}_{d}

Question:

In Exercise 11 we viewed the multivariate Student \(\mathbf{t}_{\alpha}\) distribution as a scale-mixture of the \(\mathscr{N}\left(\mathbf{0}, \mathbf{I}_{d}\right)\) distribution. In this exercise, we consider a similar transformation, but now \(\sum^{1 / 2} Z \sim \mathscr{N}\left(0, \sum\right)\) is not divided but is multiplied by \(\sqrt{S}\), with \(S \sim \operatorname{Gamma}(\alpha / 2, \alpha / 2)\) :

\[ \begin{equation*} \boldsymbol{X}=\boldsymbol{\mu}+\sqrt{S} \boldsymbol{\Sigma}^{1 / 2} \boldsymbol{Z} \tag{4.53} \end{equation*} \]

where \(S\) and \(\mathbf{Z}\) are independent and \(\alpha>0\).

BESSEL DISTRIBUTION

(a) Show, using Exercise 12, that for \(\boldsymbol{\Sigma}^{1 / 2}=\mathbf{I}_{d}\) and \(\boldsymbol{\mu}=\mathbf{0}\), the random vector \(\boldsymbol{X}\) has a \(d\) dimensional Bessel distribution, with density:

\[ \kappa_{\alpha}(\boldsymbol{x}):=\frac{2^{1-(\alpha+d) / 2} \alpha^{(\alpha+d) / 4}\|\boldsymbol{x}\|^{\| \alpha-d) / 2}}{\pi^{d / 2} \Gamma(\alpha / 2)} K_{(\alpha-d) / 2}(\|\boldsymbol{x}\| \sqrt{\alpha}), \quad \boldsymbol{x} \in \mathbb{R}^{d} \]

where \(K_{p}\) is the modified Bessel function of the second kind given in (4.49). We write \(X \sim\) Bessel \(\alpha\left(\mathbf{0}, \mathbf{I}_{d}\right)\). A random vector \(\boldsymbol{X}\) is said to have a Bessel \(\alpha(\boldsymbol{\mu}, \boldsymbol{\Sigma})\) distribution if it can be written in the form (4.53). By the transformation rule (C.23), its density is given by \(\frac{1}{\sqrt{\Sigma \mid}} K_{\alpha}\left(\sum^{-1 / 2}(\boldsymbol{x}-\boldsymbol{\mu})\right)\). Special instances of the Bessel pdf include:

\[ \begin{aligned} & \kappa_{2}(x)=\frac{\exp (-\sqrt{2}|x|)}{\sqrt{2}} \\ & \kappa_{4}(x)=\frac{1+2|x|}{2} \exp (-2|x|) \\ & \kappa_{4}\left(x_{1}, x_{2}, x_{3}\right)=\frac{1}{\pi} \exp \left(-2 \sqrt{x_{1}^{2}+x_{2}^{2}+x_{3}^{2}}\right) \\ & \quad \kappa_{d+1}(\boldsymbol{x})=\frac{((d+1) / 2)^{d / 2} \sqrt{\pi}}{(2 \pi)^{d / 2} \Gamma((d+1) / 2)} \exp (-\sqrt{d+1}\|\boldsymbol{x}\|), \quad \boldsymbol{x} \in \mathbb{R}^{d} . \end{aligned} \]
Note that \(k_{2}\) is the (scaled) pdf of the double-exponential or Laplace distribution.

(b) Given the data \(\tau=\left\{x_{1}, \ldots, x_{n}\right\}\) in \(\mathbb{R}^{d}\), we wish to fit a Bessel pdf to the data by employing the EM algorithm, augmenting the data with the vector \(S=\left[S_{1}, \ldots, S_{n}\right]^{\top}\) of missing data. We assume that \(\alpha\) is known and \(\alpha>d\). Show that conditional on \(\tau\) (and given \(\theta\) ), the missing data vector \(\boldsymbol{S}\) has independent components, with \(S_{i} \sim \operatorname{GIG}\left(\alpha, b_{i},(\alpha-d) / 2\right)\), with \(b_{i}:=\) \(\left\|\Sigma^{-1 / 2}\left(x_{i}-\mu\right)\right\|^{2}, i=1, \ldots, n\).

(c) At iteration \(t\) of the EM algorithm, let \(g^{(t)}(\boldsymbol{s})=g\left(\boldsymbol{s} \mid \tau, \boldsymbol{\theta}^{(t-1)}\right)\) be the density of the missing data, given the observed data \(\tau\) and the current parameter guess \(\theta^{(t-1)}\). Show that the expected complete-data log-likelihood is given by:
\[ \begin{equation*} Q^{(t)}(\boldsymbol{\theta}):=\mathbb{E}_{g^{(t)}} \ln g(\tau, \boldsymbol{S} \mid \boldsymbol{\theta})=-\frac{1}{2} \sum_{i=1}^{n} b_{i}(\boldsymbol{\theta}) w_{i}^{(t-1)}+\text { constant } \tag{4.54} \end{equation*} \]
where \(b_{i}(\boldsymbol{\theta})=\left\|\Sigma^{-1 / 2}\left(x_{i}-\boldsymbol{\mu}\right)\right\|^{2}\) and \[ w_{i}^{(t-1)}:=\frac{\sqrt{\alpha} K_{(\alpha-d+2) / 2}\left(\sqrt{\alpha b_{i}\left(\boldsymbol{\theta}^{(t-1)}\right)}\right)}{\sqrt{b_{i}\left(\boldsymbol{\theta}^{(t-1)}\right) K_{(\alpha-d) / 2}\left(\sqrt{\alpha b_{i}\left(\boldsymbol{\theta}^{(t-1)}\right)}\right)}}-\frac{\alpha-d}{b_{i}\left(\boldsymbol{\theta}^{(t-1)}\right)}, \quad i=1, \ldots, n \]

(d) From (4.54) derive the M-step of the EM algorithm. That is, show how \(\boldsymbol{\theta}^{(t)}\) is updated from \(\theta^{(t-1)}\)

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Related Book For  book-img-for-question

Data Science And Machine Learning Mathematical And Statistical Methods

ISBN: 9781118710852

1st Edition

Authors: Dirk P. Kroese, Thomas Taimre, Radislav Vaisman, Zdravko Botev

Question Posted: