DS-GA 1018: Probabilistic Time Series Analysis (student)

Complete your answers on a separate sheet (handwritten or LaTeX) and submit on Gradescope.

Problem 1 (12 points)

This problem explores the approximation error of the linearization in the Extended Kalman Filter (EKF). This problem asks you to derive and analyze a more accurate second-order approximation than the first-order approximation we saw in lecture.

(i) (4 points)

The second-order Taylor expansion of f(\cdot) around \boldsymbol{\mu} is: f(\boldsymbol{z}_{t}) \approx f(\boldsymbol{\mu}) + \mathbf{G}(\boldsymbol{z}_{t} - \boldsymbol{\mu}) + \frac{1}{2} \sum_{i} (\boldsymbol{z}_{t} - \boldsymbol{\mu})^T \mathbf{H}_i (\boldsymbol{z}_{t} - \boldsymbol{\mu}) \boldsymbol{e}_i where \boldsymbol{\mu} = \boldsymbol{\mu}_{t \mid t-1}, \mathbf{G} = \nabla f(\boldsymbol{\mu}) is the Jacobian, \mathbf{H}_i is the Hessian matrix for the i-th component of f, and \boldsymbol{e}_i is the i-th standard basis vector.

Assume \boldsymbol{z}_t \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma}). By taking the expectation \mathbb{E}[\cdot] of this expansion, show that the second-order approximation for the true mean \boldsymbol{\mu}_{\text{true}} = \mathbb{E}[f(\boldsymbol{z}_{t})] is: \boldsymbol{\mu}_{\text{true}} \approx f(\boldsymbol{\mu}) + \frac{1}{2} \boldsymbol{b} where \boldsymbol{b} is a vector whose i-th component is \text{tr}(\mathbf{H}_i \boldsymbol{\Sigma}). (Hint: You will need the property \mathbb{E}[(\boldsymbol{z} - \boldsymbol{\mu})^T \mathbf{A} (\boldsymbol{z} - \boldsymbol{\mu})] = \text{tr}(\mathbf{A} \boldsymbol{\Sigma}).

Now, consider a simple 1D system to analyze the approximation error. Let the non-linear function be f(z) = z^2 and the prior be z_t \sim \mathcal{N}(\mu, \sigma^2). We will assume there is no observation noise (\boldsymbol{v}_t = 0, \boldsymbol{R} = 0) to isolate the error from the linearization itself.

(ii) (2 points)

Calculate the second-order EKF predicted mean using your result from part (i). (Hint: For a 1D function, the Hessian H_1 is just the scalar f''(z).)

(iii) (2 points)

Calculate the true predicted mean, \mathbb{E}[x_t] = \mathbb{E}[f(z_t)]. (Hint: For z \sim \mathcal{N}(\mu, \sigma^2), \mathbb{E}[z^2] = \mu^2 + \sigma^2.)

(iv) (2 points)

Calculate the true predicted variance, \text{Var}(x_t) = \text{Var}(f(z_t)). (Hint: You will also need \mathbb{E}[z^4] = \mu^4 + 6\mu^2\sigma^2 + 3\sigma^4.)

(v) (2 points)

Compare the standard EKF mean (from Eq. 8.7 in Lecture 8) with the second-order mean (your result from part ii) and the true mean (your result from part iii). Briefly explain why the standard EKF mean is incorrect, and why the second-order approximation is so accurate in this specific case.

Problem 2 (10 points)

In lecture we showed how we can compose different Gaussian Process kernels to generate a new Gaussian process kernel. When we do this composition, we are implicitly stating that, given two kernels \kappa_1 and \kappa_2 that:

\begin{align} \kappa_+(\mathbf{x}, \mathbf{x}') &= \kappa_1(\mathbf{x}, \mathbf{x}') + \kappa_2(\mathbf{x}, \mathbf{x}') \\ \kappa_\times(\mathbf{x}, \mathbf{x}') &= \kappa_1(\mathbf{x}, \mathbf{x}') \times \kappa_2(\mathbf{x}, \mathbf{x}') \end{align}

are both valid kernel functions.

(i) (5 points)

Prove that if \kappa_1 and \kappa_2 are valid kernel functions, then \kappa_+ is a valid kernel function.

(ii) (5 points)

Prove that if \kappa_1 and \kappa_2 are valid kernel functions, then \kappa_\times is a valid kernel function.

Problem 3 (10 points)

Imagine that we are doing GP regression using the kernel function:

\kappa(x,x') = \cos \left( \frac{2 \pi |x-x'|}{P} \right)

We are also making the default choice for the mean function: \mu(x) = 0. We have two observations at x_1 and x_2 with values y_1 and y_2. Also assume that:

x_2 = x_1 + P

Finally, assume that our model includes independent, Gaussian observation noise with variance \sigma_w^2. Using this GP and observations, answer the following questions:

(i) (5 points)

Derive the mean and variance for a new observation x_\star. Write your answer in terms of:

k_\star = \kappa(x_\star, x_1)

(ii) (2 points)

Now assume that x_\star = x_1 + 3P. Substitute in the correct value of k_\star and write the mean and variance prediction for the observation at x_\star.

(iii) (3 points)

Still assuming that x_\star = x_1 + 3P, take the limit of \sigma_w \to 0. What is the new answer for the mean and variance? Explain the intuition behind this result.