Complete your answers on a separate sheet (handwritten or LaTeX) and submit on Gradescope.
Problem 1 (12 points)
This problem explores the approximation error of the linearization in the Extended Kalman Filter (EKF). This problem asks you to derive and analyze a more accurate second-order approximation than the first-order approximation we saw in lecture.
(i) (4 points)
The second-order Taylor expansion of f(\cdot) around \boldsymbol{\mu} is: f(\boldsymbol{z}_{t}) \approx f(\boldsymbol{\mu}) + \mathbf{G}(\boldsymbol{z}_{t} - \boldsymbol{\mu}) + \frac{1}{2} \sum_{i} (\boldsymbol{z}_{t} - \boldsymbol{\mu})^T \mathbf{H}_i (\boldsymbol{z}_{t} - \boldsymbol{\mu}) \boldsymbol{e}_i where \boldsymbol{\mu} = \boldsymbol{\mu}_{t \mid t-1}, \mathbf{G} = \nabla f(\boldsymbol{\mu}) is the Jacobian, \mathbf{H}_i is the Hessian matrix for the i-th component of f, and \boldsymbol{e}_i is the i-th standard basis vector.
Assume \boldsymbol{z}_t \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma}). By taking the expectation \mathbb{E}[\cdot] of this expansion, show that the second-order approximation for the true mean \boldsymbol{\mu}_{\text{true}} = \mathbb{E}[f(\boldsymbol{z}_{t})] is: \boldsymbol{\mu}_{\text{true}} \approx f(\boldsymbol{\mu}) + \frac{1}{2} \boldsymbol{b} where \boldsymbol{b} is a vector whose i-th component is \text{tr}(\mathbf{H}_i \boldsymbol{\Sigma}). (Hint: You will need the property \mathbb{E}[(\boldsymbol{z} - \boldsymbol{\mu})^T \mathbf{A} (\boldsymbol{z} - \boldsymbol{\mu})] = \text{tr}(\mathbf{A} \boldsymbol{\Sigma}).
Now, consider a simple 1D system to analyze the approximation error. Let the non-linear function be f(z) = z^2 and the prior be z_t \sim \mathcal{N}(\mu, \sigma^2). We will assume there is no observation noise (\boldsymbol{v}_t = 0, \boldsymbol{R} = 0) to isolate the error from the linearization itself.
(ii) (2 points)
Calculate the second-order EKF predicted mean using your result from part (i). (Hint: For a 1D function, the Hessian H_1 is just the scalar f''(z).)
(iii) (2 points)
Calculate the true predicted mean, \mathbb{E}[x_t] = \mathbb{E}[f(z_t)]. (Hint: For z \sim \mathcal{N}(\mu, \sigma^2), \mathbb{E}[z^2] = \mu^2 + \sigma^2.)
(iv) (2 points)
Calculate the true predicted variance, \text{Var}(x_t) = \text{Var}(f(z_t)). (Hint: You will also need \mathbb{E}[z^4] = \mu^4 + 6\mu^2\sigma^2 + 3\sigma^4.)
(v) (2 points)
Compare the standard EKF mean (from Eq. 8.7 in Lecture 8) with the second-order mean (your result from part ii) and the true mean (your result from part iii). Briefly explain why the standard EKF mean is incorrect, and why the second-order approximation is so accurate in this specific case.
Problem 2 (10 points)
In lecture we showed how we can compose different Gaussian Process kernels to generate a new Gaussian process kernel. When we do this composition, we are implicitly stating that, given two kernels \kappa_1 and \kappa_2 that:
\begin{align} \kappa_+(\mathbf{x}, \mathbf{x}') &= \kappa_1(\mathbf{x}, \mathbf{x}') + \kappa_2(\mathbf{x}, \mathbf{x}') \\ \kappa_\times(\mathbf{x}, \mathbf{x}') &= \kappa_1(\mathbf{x}, \mathbf{x}') \times \kappa_2(\mathbf{x}, \mathbf{x}') \end{align}
are both valid kernel functions.
(i) (5 points)
Prove that if \kappa_1 and \kappa_2 are valid kernel functions, then \kappa_+ is a valid kernel function.
(ii) (5 points)
Prove that if \kappa_1 and \kappa_2 are valid kernel functions, then \kappa_\times is a valid kernel function.
Problem 3 (10 points)
Imagine that we are doing GP regression using the kernel function:
\kappa(x,x') = \cos \left( \frac{2 \pi |x-x'|}{P} \right)
We are also making the default choice for the mean function: \mu(x) = 0. We have two observations at x_1 and x_2 with values y_1 and y_2. Also assume that:
x_2 = x_1 + P
Finally, assume that our model includes independent, Gaussian observation noise with variance \sigma_w^2. Using this GP and observations, answer the following questions:
(i) (5 points)
Derive the mean and variance for a new observation x_\star. Write your answer in terms of:
k_\star = \kappa(x_\star, x_1)
(ii) (2 points)
Now assume that x_\star = x_1 + 3P. Substitute in the correct value of k_\star and write the mean and variance prediction for the observation at x_\star.
(iii) (3 points)
Still assuming that x_\star = x_1 + 3P, take the limit of \sigma_w \to 0. What is the new answer for the mean and variance? Explain the intuition behind this result.