Auto-correlation, cross-correlation, and AR modeling (student)

Instructions: Submit your completed lab worksheet to Gradescope by 1:30 PM the Monday of the next lab.

You can find a Colab version of the required code, including plotting functions here.

This lab is meant to improve your understanding of the ARMA model and the correlation functions we’ve been discussing in class.

By the end of the lab, students should:
1. Be able to generate a time series drawn from a white noise process and an AR process.
2. Be able to calculate the auto-correlation function and interpret the partial auto-correlation function.
3. Be able to conduct a Bayesian MAP estimate of AR process parameters.

Part I: Auto-Correlation Function

Start by implementing the auto-correlation function, from scratch.

from typing import Any, Optional

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy.stats import norm, uniform
from scipy.optimize import minimize
lufrom statsmodels.tsa.stattools import acf, pacf

def acf_impl(x_t: np.ndarray, n_lags: int) -> np.ndarray:
    """Calculate the estimated ACF function for a time series."""
    # TODO: write your code here.
    return

Now we can run our function on white noise and make sure we get the right result.

w_t \sim N(0, \sigma^2)

To do this you will have to:

Generate white noise with \sigma = 1 and sample size n = 500 from the process above.
Calculate the sample ACF up to lag = 20.
Calculate the analytical ACF.

# Use this variable name for the test to work
n_lags = 20
mean = 0.0
std = 1.0
n = 500

# 1 - Generate realizations from the white noise process.
w_t = #TODO

# 2 - Calculate the sample ACF with lag of 20 (you already did the hard work).
acf_estim_yours = # TODO

# 3 - Calculate the analytical ACF. This shouldn't be too many lines of code.
acf_analytical = # TODO

# Let's check your implementation
acf_estim_statsm = acf(w_t, nlags=n_lags)
np.testing.assert_array_almost_equal(acf_estim_statsm, acf_estim_yours)

# Now sit back and let the plotting functions visualize the different quantities.
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(15, 5), dpi=200)
fontsize = 15
colors = ['#7fcdbb', '#2c7fb8']

# Plot our white noise.
ax1.plot(w_t, label=r'$W_t$', color=colors[0])
ax1.set_title('Data', fontsize=fontsize)
ax1.set_xlabel('Time Index', fontsize=fontsize)
ax1.set_ylabel('Signal', fontsize=fontsize)
ax1.legend(fontsize=fontsize)

# Plot comparison between analytic and estimated.
ax2.stem(acf_estim_yours, linefmt=colors[0], markerfmt=colors[0], basefmt='k', label='Estimated ACF')
ax2.stem(acf_analytical, linefmt=colors[1], markerfmt=colors[1], basefmt='k', label='Analytic ACF')
ax2.set_ylim([1.3*min(acf_estim_yours), 1.1*(std**2)])
ax2.set_title('ACF Comparison', fontsize=fontsize)
ax2.set_xlabel(r'$|h|$', fontsize=fontsize)
ax2.set_ylabel(r'$\rho(|h|)$', fontsize=fontsize)
ax2.legend(fontsize=fontsize)

# Plot comparison between our estimate and the library estimate.
acf_val_impl = acf_impl(x_t=w_t, n_lags=n_lags)
ax3.plot(acf_estim_statsm, 'o', color='grey', label='Package ACF')
ax3.plot(acf_estim_yours, 'x', ms=6, mew=3, color=colors[0], label='Your ACF')
ax3.legend(fontsize=fontsize)
ax3.set_xlabel(r'$|h|$', fontsize=fontsize)
ax3.set_ylabel(r'$\rho(|h|)$', fontsize=fontsize)
ax3.set_title('Implementation Comparison', fontsize=fontsize)

plt.show()

Intuition building. Now imagine that instead of having a time series with 500 data points, we only had 50. How would you expect the estimated ACF to change?

The true ACF does not change, but estimate of the ACF will become much noisier.

Now that you’ve registered your guess, let’s try it out. Repeat the steps from above:

Generate white noise with \sigma = 1 and sample size n = 50 from the process.
Calculate the sample ACF up to lag = 20.

# Use this variable name for the test to work
n_lags = 20
mean = 0.0
std = 1.0
n = 50

# 1 - Generate realizations from the white noise process.
w_t_short = #TODO

# 2 - Calculate the sample ACF with lag of 20 (you already did the hard work).
acf_estim_yours = #TODO

# Let's check your implementation again.
acf_estim_statsm = acf(w_t_short, nlags=n_lags)
np.testing.assert_array_almost_equal(acf_estim_statsm, acf_estim_yours)

# Now sit back and let the plotting functions visualize the different quantities.
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5), dpi=100)
fontsize = 15
colors = ['#7fcdbb', '#2c7fb8']

# Plot our white noise.
ax1.plot(w_t_short, label=r'$W_t$', color=colors[0])
ax1.set_title('Data', fontsize=fontsize)
ax1.set_xlabel('Time Index', fontsize=fontsize)
ax1.set_ylabel('Signal', fontsize=fontsize)
ax1.legend(fontsize=fontsize)

# Plot comparison between analytic and estimated.
ax2.stem(acf_estim_yours, linefmt=colors[0], markerfmt=colors[0], basefmt='k', label='Estimated ACF')
ax2.stem(acf_analytical, linefmt=colors[1], markerfmt=colors[1], basefmt='k', label='Analytic ACF')
ax2.set_ylim([1.3*min(acf_estim_yours), 1.1])
ax2.set_title('ACF Comparison', fontsize=fontsize)
ax2.set_xlabel(r'$|h|$', fontsize=fontsize)
ax2.set_ylabel(r'$\rho(|h|)$', fontsize=fontsize)
ax2.legend(fontsize=fontsize)

plt.show()

What is happening in the plot above?

Fewer samples leads to a noisier estimate of the ACF.

Part II: Auto-Correlation Function for AR(2)

The auto-correlation function of white noise is somewhat trivial, what about a more interesting process? Let’s take an AR(2) process of the form:

X_{t} = \phi_1 X_{t-1} + \phi_2 X_{t-2} + W_t

We can generate the same plots as before, but we will have to:

Generate samples for the AR(2) process. Assume the boundary condition that X_0 = W_0 and X_1 = \phi_1 X_0 + W_1. Also assume that W_t is the same white noise process from above, \phi_1 = 0.5, and \phi_2 = 0.2. Create a sample size of n=500.
Calculate the sample ACF up to lag = 20.
You don’t need to calculate the analytic ACF. We will save that for the homework.

def ar_two(phi_one: float, phi_two: float, n_samps: int, 
           sigma_w: Optional[float] = 1) -> np.ndarray:
    """Generate samples from an AR(2) process."""
    # TODO: write your code here.
    return

# Let's check your function against a reference value:
np.random.seed(1)
test_implementation = ar_two(0.1, 0.1, 5)
reference_value = np.array([1.62434536, -0.44932188, -0.4106694, -1.15896775, 0.70844391])
np.testing.assert_array_almost_equal(test_implementation, reference_value, decimal=5)

n_lags = 20
np.random.seed(2)

# 1. Generate the samples using the function above.
x_t = # TODO

# 2. Calculate the sample ACF.
acf_estim_yours = # TODO

# Now sit back and let the plotting functions visualize the different quantities.
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5), dpi=100)
fontsize = 15
colors = ['#7fcdbb', '#2c7fb8']

# Plot our white noise.
ax1.plot(w_t, label=r'$W_t$', color=colors[0]) # This is not the same white noise realization as was used for X_t, it's just here for illustration.
ax1.plot(x_t, label=r'$X_t$', color=colors[1])
ax1.set_title('Data', fontsize=fontsize)
ax1.set_xlabel('Time Index', fontsize=fontsize)
ax1.set_ylabel('Signal', fontsize=fontsize)
ax1.legend(fontsize=fontsize)

# Plot comparison between analytic and estiamted.
ax2.stem(acf_estim_yours, linefmt=colors[0], markerfmt=colors[0], basefmt='k', label='Estimated ACF')
ax2.set_ylim([-0.2, 1.1])
ax2.set_title('ACF Estimate', fontsize=fontsize)
ax2.set_xlabel(r'$|h|$', fontsize=fontsize)
ax2.set_ylabel(r'$\rho(|h|)$', fontsize=fontsize)
ax2.legend(fontsize=fontsize)

plt.show()

What is happening in the plot above?

Part III: Partial Auto-Correlation Function for AR(2)

While it’s clear that the auto-correlation function is more complex for AR(2) than for white noise, it would be hard for us to identify this as an AR(2) process from the auto-correlation function alone. As we discussed in lecture, we can instead use the partial auto-correlation function to achieve this. Implementing the PACF is more involved than what is appropriate for this lab, but we can use the library implementation.

n_lags = 20

pacf_estim = pacf(x_t, n_lags)

# Now sit back and let the plotting functions visualize the different quantities.
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5), dpi=100)
fontsize = 15
colors = ['#7fcdbb', '#2c7fb8']

# Plot our white noise.
ax1.plot(w_t, label=r'$W_t$', color=colors[0]) # This is not the same white noise realization as was used for X_t, it's just here for illustration.
ax1.plot(x_t, label=r'$X_t$', color=colors[1])
ax1.set_title('Data', fontsize=fontsize)
ax1.set_xlabel('Time Index', fontsize=fontsize)
ax1.set_ylabel('Signal', fontsize=fontsize)
ax1.legend(fontsize=fontsize)

# Plot comparison between analytic and estimated.
ax2.stem(pacf_estim, linefmt=colors[0], markerfmt=colors[0], basefmt='k', label='Estimated PACF')
ax2.set_ylim([-0.2, 1.1])
ax2.set_title('PACF Estimate', fontsize=fontsize)
ax2.set_xlabel(r'$|h|$', fontsize=fontsize)
ax2.set_ylabel(r'$\rho(|h|)$', fontsize=fontsize)
ax2.legend(fontsize=fontsize)

plt.show()

What about this PACF plot suggests that the process is AR(2)?

The PACF trends to 0 after |h|=2, suggesting that the AR process is of order 2.

Part IV: Modeling an AR(2) Process

The plots above help us understand what type of model would be appropriate for our data. Of course, in practice, we know that this data is AR(2). The next question is how to model the data and use that model to make predictions about future time steps. In this section you will:

Implement the posterior functions for an AR(2) process given a prior on the parameters.
Use the posterior function to fit the Maximum a posteriori estimate of the AR(2) process parameters.
Visualize how well those parameters predict future data.

The overall goal will be to conduct Bayesian inference. Remember from Lecture 2 that in Bayesian inference we have one quantity of interest, the posterior of the parameters given the data, and to calculate it we will need the prior on the parameters and the likelihood of the data. The marginal of the data also shows up in our equation, but it is a normalizing factor that does not depend on the parameters of interest. When we are trying to find the maximum a posteriori estimate of our parameter values we can ignore it.

First we’ll need to specify a prior for our AR(2) parameters. There are three free parameters to our version of the model, \phi_1, \phi_2, and \sigma_w. We know that large values of \phi_1 and \phi_2 are unlikely (they will violate causality), so we will set a Gaussian prior for both of them of the form: p(\phi_1) = p(\phi_2) = \mathcal{N}(0 , \sigma_\phi^2).

For the standard deviation of the white noise, we have less intuition, but we do know that it must be positive. Therefore, a reasonable distribution that captures our prior understanding is a uniform distribution that is bounded below by 0: p(\sigma_w) = \mathrm{Unif}[0,\sigma_\mathrm{max}].

Implement a prior function in the class below that returns the log prior: \log p(\phi_1, \phi_2, \sigma_w) = \log[p(\phi_1) p(\phi_2) p(\sigma_w)] = \log[p(\phi_1)] + \log[p(\phi_2)] + \log[p(\sigma_w)].
Implement the log likelihood function of the data given the parameters.
Implement the log posterior function of the parameters given the data.

class ARTwoModel:
    """Class implementing prior, likelihood, posterior, and predictions for an AR(2) model."""
        # TODO: write your code here.
        return

# Let's run some tests to make sure the implementations are correct.
a_model = ARTwoModel(2.0, 5.0)

phi_one, phi_two = 0.7, 0.2
sigma_w = 2.0
params_test = np.array([phi_one, phi_two, sigma_w])
x_t = np.array([3.52810469, 3.2699877, 4.9520883, 8.60224575, 10.74710566,
                7.28886735, 9.15180511, 7.56132263, 6.91684916, 7.17525595])

np.testing.assert_almost_equal(a_model.log_prior([0.3, 0.4, 2.0]), -4.864859339963337)
np.testing.assert_almost_equal(a_model.log_prior([0.3, 0.4, 6.0]), -np.inf)
np.testing.assert_almost_equal(a_model.log_likelihood(x_t, params_test), -23.520459141643073)
np.testing.assert_almost_equal(a_model.log_posterior(params_test, x_t), -28.420318481606408)

With our posterior in hand we can now use it to extract our maximum a posteriori estimate. We will use a numerical optimizer from scipy. We will test the posteriors we get as a function of two axes:

The choice of prior. In reality you would only ever select one prior, but this will give us intuition into how important the prior can be in Bayesian inference. Our prior choices will be \sigma_\mathrm{max} = 50.0, \sigma_\phi = 20.0 (a wide prior) and \sigma_\mathrm{max} = 5.0, \sigma_\phi = 0.2 (a narrow prior).
The length of the data. We will simulate 10000 datapoints, but we will compare the results for subsets of the data.

# Let's start with our first prior. Let's make it fairly broad in both parameters.
sigma_phi = 20.0
sigma_max = 50.0
a_model_wide = ARTwoModel(sigma_phi, sigma_max)

# Generate our data
n_samps_data = 10000
phi_one, phi_two = 0.7, 0.2
sigma_w = 0.3
np.random.seed(4)
x_t = ar_two(phi_one, phi_two, n_samps_data, sigma_w)

# We want the maximum but have a minimizer. So we will just minimize the negative of the log 
posterior (often called the # negative log posterior). Start with an initial guess of zero 
for the phis and 1 for sigma_w.
params_zero = np.array([0.0, 0.0, 1.0])
data_lengths = [5, 25, 125, 625, 3125, 10000]
results_wide = []
for data_length in data_lengths:
    res = minimize(lambda params: -a_model_wide.log_posterior(params, x_t[:data_length]), 
                   params_zero, method='nelder-mead',
                   options={'xatol': 1e-8, 'disp': False})
    results_wide.append(res.x)
results_wide = np.array(results_wide)

# Lets repeat the process for our narrow prior
sigma_phi = 0.2
sigma_max = 50.0
a_model_narrow = ARTwoModel(sigma_phi, sigma_max)
params_zero = np.array([0.0, 0.0, 1.0])
results_narrow = []
for data_length in data_lengths:
    res = minimize(lambda params: -a_model_narrow.log_posterior(params, x_t[:data_length]), 
                   params_zero, method='nelder-mead',
                   options={'xatol': 1e-8, 'disp': False, 'maxiter': 10000})
    results_narrow.append(res.x)
results_narrow = np.array(results_narrow)

# Now sit back and let the plotting functions visualize the different quantities.
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(20, 5), dpi=100)
fontsize = 15
colors = ['#7fcdbb', '#2c7fb8']

# Plot our results
ax1.plot(data_lengths, results_wide[:,0], label='Wide Prior', color=colors[0])
ax1.plot(data_lengths, results_narrow[:,0], label='Narrow Prior', color=colors[1])
ax1.set_xscale('log')
ax1.axhline(phi_one, c='k', label='True Value')
ax1.set_title(r'$\phi_1$', fontsize=fontsize)
ax1.set_xlabel(r'Size of Dataset', fontsize=fontsize)
ax1.set_ylabel(r'$\phi_1$', fontsize=fontsize)
ax1.legend(fontsize=fontsize)

# Plot our results
ax2.plot(data_lengths, results_wide[:,1], label='Wide Prior', color=colors[0])
ax2.plot(data_lengths, results_narrow[:,1], label='Narrow Prior', color=colors[1])
ax2.set_xscale('log')
ax2.axhline(phi_two, c='k', label='True Value')
ax2.set_title(r'$\phi_2$', fontsize=fontsize)
ax2.set_xlabel(r'Size of Dataset', fontsize=fontsize)
ax2.set_ylabel(r'$\phi_2$', fontsize=fontsize)
ax2.legend(fontsize=fontsize)

# Plot our results
ax3.plot(data_lengths, results_wide[:,2], label='Wide Prior', color=colors[0])
ax3.plot(data_lengths, results_narrow[:,2], label='Narrow Prior', color=colors[1])
ax3.set_xscale('log')
ax3.axhline(sigma_w, c='k', label='True Value')
ax3.set_title(r'$\sigma_w$', fontsize=fontsize)
ax3.set_xlabel(r'Size of Dataset', fontsize=fontsize)
ax3.set_ylabel(r'$\sigma_w$', fontsize=fontsize)
ax3.legend(fontsize=fontsize)

plt.show()

How much does our choice of prior matter? Where does it matter the most?

The prior matters most when there is less data. As the size of our dataset grows, the effect of our prior on the MAP estimate of the parameters becomes small.

While we didn’t expect you to mention this, the prior also regularizes our inference. For example, it stops our MAP from trending towards non-causal values when the amount of data is small (as is the case for the wide prior in this example).

Let’s use our fit model for prediction! Our distribution of interest is: p(X_{t+1:t+n}|X_{1:t},\phi_1,\phi_2,\sigma_w).

To visualize our predictions, we can either sample from this distribution or we can plot its mean and variance. The mean and variance of an AR(2) process is a bit involved, so here we’ll just focus on drawing samples. You will:

Write code to sample from the conditional distribution.

def sample_x_future(data: np.ndarray, params: np.ndarray, n_future: int) -> np.ndarray:
    """Sample from the posterior of X_{t+1:t+n} given X_{1:t} and the parameters
    of our AR(2) process."""
    # TODO: write your code here.
    return

# Let's visualize the predictions being output by our model
n_future = 100
n_samples_posterior = 5000
samples = np.zeros((n_samples_posterior, n_future))
for i in range(n_samples_posterior):
    # Use the full data prediction with the narrow prior as our parameters.
    samples[i] = sample_x_future(x_t, results_narrow[-1], n_future)

t_data = np.arange(n_samps_data)
t_samples = np.arange(n_samps_data, n_samps_data + n_future)
fig = plt.figure(figsize=(8,8), dpi=100)
plt.plot(t_data, x_t, color=colors[1], label='Data')
plt.plot(t_samples, np.mean(samples, axis=0), color=colors[0], label='Sample Mean')
plt.fill_between(t_samples, np.mean(samples, axis=0) - np.std(samples, axis=0), 
                 np.mean(samples, axis=0) + np.std(samples, axis=0),
                 color=colors[0], label='Sample Standard Deviation', alpha=0.2)
plt.xlim([n_samps_data-200, n_samps_data + n_future])
plt.ylabel(r'$X_t$', fontsize=fontsize)
plt.xlabel('Time Index', fontsize=fontsize)
plt.title('Posterior Sampling', fontsize=fontsize)
plt.legend(fontsize=fontsize)
plt.show()

Why does the mean behave the way that it does? What about the standard deviation of the sample? How does this relate to the auto-correlation function?

The mean starts tightly correlated to the previous time steps and then trends towards 0 as we move farther from the observed data. Similarly, we start with a small variance which increases as we move farther from the observed data. This is a reflection of our decaying auto-correlation function: the farther we move from our observation the more our prediction trends towards the unconditioned mean and variance of the AR(2) process.