DS-GA 1018: Probabilistic Time Series Analysis (student)

Complete your answers on a separate sheet (handwritten or LaTeX) and submit on Gradescope.

Problem 1 (20 points)

In lecture we discussed the importance of the initial parameter values for the results of the expectation-maximization algorithm. In this problem we will quantitatively demonstrate that importance. Consider the EM algorithm for an LDS system like the one in class. Imagine that we have an initial guess for our parameters:

\boldsymbol{\theta}_0 = \{ \boldsymbol{\mu}_{0,0}, \boldsymbol{\Sigma}_{0,0}, \mathbf{A}_{0}, \mathbf{Q}_0, \mathbf{C}_0, \mathbf{R}_0\}

We have also set our distribution q_1(\mathbf{Z}) = p(\mathbf{Z}|\mathbf{X},\boldsymbol{\theta}_0). In other words, q_1 is the distributions set by the Kalman smoothing means and covariances (the standard choice).

(i) (3 points)

Consider the case where \mathbf{A}_0 = 0 \cdot \mathbb{I} and \mathbf{C}_0 = 0 \cdot \mathbb{I}. What will the value of \boldsymbol{\mu}_{0,1} be in terms of the data in \mathbf{X} and the parameters in \boldsymbol{\theta}_0? For all the subproblems, don’t forget to consider how our choice of \boldsymbol{\theta}_0 affects terms like \boldsymbol{\mu}_{t|t-1}.

(ii) (3 points)

Still considering the case where \mathbf{A}_0 = 0 \cdot \mathbb{I} and \mathbf{C}_0 = 0 \cdot \mathbb{I}, what will the value of \boldsymbol{\Sigma}_{0,1} be in terms of the data in \mathbf{X} and the parameters in \boldsymbol{\theta}_0?

(iii) (4 points)

Still considering the case where \mathbf{A}_0 = 0 \cdot \mathbb{I} and \mathbf{C}_0 = 0 \cdot \mathbb{I}, what will the value of \mathbf{A}_{1} be in terms of the data in \mathbf{X} and the parameters in \boldsymbol{\theta}_0?

(iv) (4 points)

Still considering the case where \mathbf{A}_0 = 0 \cdot \mathbb{I} and \mathbf{C}_0 = 0 \cdot \mathbb{I}, what will the value of \mathbf{Q}_{1} be in terms of the data in \mathbf{X} and the parameters in \boldsymbol{\theta}_0?

(v) (4 points)

Still considering the case where \mathbf{A}_0 = 0 \cdot \mathbb{I} and \mathbf{C}_0 = 0 \cdot \mathbb{I}, what will the value of \mathbf{C}_{1} be in terms of the data in \mathbf{X} and the parameters in \boldsymbol{\theta}_0?

(vi) (2 points)

Given these results, what is the solution to \boldsymbol{\mu}_{0,n}, \boldsymbol{\Sigma}_{0,n}, \mathbf{A}_{n}, \mathbf{C}_{n}, and \mathbf{Q}_{n}?

Problem 2 (8 points)

We have shown that the initialization of the EM algorithm limits will determine the final result. We have largely presented this as a limitation of EM, but in this problem we will show that, for discrete Hidden Markov Models, it can be a strength. Imagine that we have an initial guess for our dHMM parameters:

\boldsymbol{\theta}_0 = \{ \boldsymbol{\pi}_0, \mathbf{A}_0 \}

We also have access to our observation probability distribution function p(\mathbf{x}_t|\mathbf{z}_t). Our goal is to show that if \mathbf{A}_{ij,0} = 0 then \mathbf{A}_{ij,1} = 0. Each subproblem will work us through the steps.

(i) (3 points)

Assume that \mathbf{A}_{ij,0} = 0 and that the other values of \mathbf{A}_{jk,0} have arbitrary values. Solve for p(z_{t,i},z_{t+1,j}|\mathbf{x}_{1:T}).

(ii) (3 points)

Assume that \mathbf{A}_{ij,0} = 0 and that the other values of \mathbf{A}_{mn,0} for m,n \neq i,j have arbitrary values. Solve for \mathbf{A}_{ij,1}.

(iii) (2 points)

What can you say about the solution for \mathbf{A}_{ij,n}. When might this behavior be considered a benefit of the EM algorithm?

Problem 3 (14 points)

Consider a discrete HMM model with K unique classes for the latent state. Imagine that the transition matrix has the form:

A_{ij} = \frac{1}{K}

Assume that the initial state vector \boldsymbol{\pi} has arbitrary values and that we have access to our observation probability distribution function p(\mathbf{x}_t|\mathbf{z}_t).

(i) (4 points)

For t>0, simplify the recursive equation for \hat{\alpha}(\mathbf{z}_t) as much as possible.

(ii) (6 points)

For t<T, simplify the recursive equation for \hat{\beta}(\mathbf{z}_t) as much as possible. Hint: Start from t=T-1 and use that to derive your solution for the remaining t. Don’t forget your solution to c_t.

(iii) (2 points)

Solve for p(\mathbf{z}_t|\mathbf{x}_{1:T}).

(iv) (2 points)

What is the intuition behind this result? Hint: the values of \mathbf{A} should factor into your argument.