Introduction to ARMA models

Lecture 2

Course: DS-GA 1018: Probabilistic Time Series Analysis
Date: September 10, 2025 (Wednesday)
Name:
NYU NetID:

With the fundamentals in place, we can start digging into our first model, the Autoregressive Moving Average Model (ARMA).


Auto-regressive Process

Consider a random process of the form:

\begin{equation*} X_{t}=\phi X_{t-1}+W_{t}~, \tag{2.1} \end{equation*}

where W_{t} is drawn from \mathcal{N}\left(0, \sigma_{W}^{2}\right) and |\phi|<1. This is known as an autoregressive process because each value of the random variable in our series depends on another random variables in the series. This autoregressive process is of order 1 because the current random variable only has a direct dependence on the previous time-step’s random variable. Let’s calculate the statistics of this random process by expanding the recursion:

\begin{align*} X_{t} & =\phi X_{t-1}+W_{t} \tag{2.2}\\ & =W_{t}+\phi W_{t-1}+\phi^{2} W_{t-2}+\ldots \tag{2.3} \end{align*}

For now, we are ignoring the boundary conditions of our problem (i.e., what happens when we get to t=0 ). The mean of this process is given by:

\begin{align*} \mu_{X} & =\mathbb{E}\left[\sum_{h=0}^{\infty} \phi^{h} W_{t-h}\right] \tag{2.4}\\ & =0 \tag{2.5} \end{align*}

We can show (by expanding the terms as infinite sums of white noise, recognizing that the covariance of two independent terms is zero, and applying the formula for a geometric series) that the covariance is given by:

\begin{align*} \gamma_{X}(h) & =\mathbb{E}\left[\left(\sum_{j=0}^{\infty} \phi^{j} W_{t+h-j}\right)\left(\sum_{k=0}^{\infty} \phi^{k} W_{t-k}\right)\right] \tag{2.6}\\ & =\sigma_{W}^{2} \sum_{j=0}^{\infty} \phi^{h+j} \phi^{j} \tag{2.7}\\ & =\sigma_{W}^{2} \phi^{|h|} \frac{1}{1-\phi^{2}} . \tag{2.8} \end{align*}

Notice that this process is stationary. (Why?) We can generalize this into the broader concept of an AR(p) process:

AR(p) process: An autoregressive model of order p - AR(p) - is a random process with the form:

\begin{equation*} X_{t}=\phi_{1} X_{t-1}+\phi_{2} X_{t-2}+\ldots \phi_{p} X_{t-p}+W_{t} \tag{2.9} \end{equation*}

where W_{t} is drawn from \mathcal{N}\left(0, \sigma_{W}^{2}\right) and \phi_{1}, \ldots, \phi_{p} are constants. For the model to be order p, it must be true that \phi_{p} \neq 0.

X_{t} is stationary, and the mean of X_{t} is 0. For cases where the mean of X_{t} is nonzero, we can recast our AR(p) relation as:

\begin{align*} X_{t}-\mu & =\phi_{1}\left(X_{t-1}-\mu\right)+\phi_{2}\left(X_{t-2}-\mu\right)+\ldots \phi_{p}\left(X_{t-p}-\mu\right)+W_{t} \tag{2.10}\\ X_{t} & =\alpha+\phi_{1} X_{t-1}+\phi_{2} X_{t-2}+\ldots \phi_{p} X_{t-p}+W_{t} \tag{2.11}\\ \alpha & =\mu\left(1-\phi_{1}-\ldots-\phi_{p}\right) \tag{2.12} \end{align*}

Note that, by construction, we assume that our autoregressive process is causal: a random variable only depends on random variables that came before it. In the slides you can find some example of AR processes.


Moving average

Consider a random process of the form:

\begin{equation*} X_{t}=W_{t}+\theta W_{t-1} \tag{2.13} \end{equation*}

where W_{t} is drawn from \mathcal{N}\left(0, \sigma_{W}^{2}\right) and |\theta|<1. This is known as a moving average process, with this particular process being order 1. Let’s once again calculate our statistics of interest, starting with the mean:

\begin{align*} \mu_{X} & =\mathbb{E}\left[W_{t}+\theta W_{t-1}\right] \tag{2.14}\\ & =0 \tag{2.15} \end{align*}

The covariance is given by:

\begin{align*} \gamma_{X}(h) & =\mathbb{E}\left[\left(W_{t+h}+\theta W_{t+h-1}\right)\left(W_{t}+\theta W_{t-1}\right)\right] \tag{2.16}\\ & = \begin{cases}\sigma_{W}^{2}\left(1+\theta^{2}\right) & h=0 \\ \sigma_{W}^{2} \theta & |h|=1 \\ 0 & |h|>1\end{cases} \tag{2.17} \end{align*}

The covariances are much more localized than in the AR case. The is also an example of a stationary process.

We can generalize this into the broader concept of an MA(p) process.

MA(p) process: A moving average model of order p - MA(p) - is a random process with the form:

\begin{equation*} X_{t}=W_{t}+\theta_{1} W_{t-1}+\theta_{2} W_{t-2}+\ldots+\theta_{p} W_{t-p} \tag{2.18} \end{equation*}

where W_{t} is drawn from \mathcal{N}\left(0, \sigma_{W}^{2}\right) and \theta_{1}, \ldots, \theta_{p} are constants. For the model to be order p, it must be true that \theta_{p} \neq 0.


Autoregressive moving average model

We can combine these two concepts together to build what’s known as an Autoregressive Moving Average Model (ARMA):

ARMA(p,q) process: An Autoregressive Moving Average process of order p, q - ARMA(p, q) - is a process with the form:

\begin{align*} &X_{t}-\phi_{1} X_{t-1}-\ldots-\phi_{p} X_{t-p}\\ &=W_{t}+\theta_{1} W_{t-1}+\theta_{2} W_{t-2}+\ldots+\theta_{q} W_{t-q} \tag{2.19} \end{align*}

where W_{t} is drawn from \mathcal{N}\left(0, \sigma_{W}^{2}\right), \phi_{1}, \ldots, \phi_{p} are constants, \theta_{1}, \ldots, \theta_{q} are constants, and both \theta_{q} \neq 0 and \phi_{p} \neq 0.


Backshift, causality, and invertability

Now that we have our full model (of which we will introduce a further extension later in this lecture), we want to understand how to quantify the statistics, measure the model parameters on data, and make new data predictions given the parameters of the model (“forecasting”). In this pursuit it will be useful to reframe our AR and MA processes slightly. Let’s return the the definition of our AR(p) process:

\begin{equation*} X_{t}=\phi_{1} X_{t-1}+\phi_{2} X_{t-2}+\ldots \phi_{p} X_{t-p}+W_{t} . \tag{2.20} \end{equation*}

We can trivially rearrange the terms as follows:

\begin{equation*} X_{t}-\phi_{1} X_{t-1}-\phi_{2} X_{t-2}-\ldots \phi_{p} X_{t-p}=W_{t} \tag{2.21} \end{equation*}

We will also want to introduce the backshift operator, B. We will define our backshift operator such that:

\begin{equation*} B X_{t}=X_{t-1} \tag{2.22} \end{equation*}

This leads us to the autoregressive operator.

(Autoregressive Operator): The autoregressive operator, P(B), for an AR(p) model is defined as:

\begin{equation*} P(B)=1-\phi_{1} B-\phi_{2} B^{2}-\ldots-\phi_{p} B^{p} \tag{2.23} \end{equation*}

such that:

\begin{equation*} P(B) X_{t}=W_{t} . \tag{2.24} \end{equation*}

Before, we derived the expectation value and covariance of an \operatorname{AR}(1) process by expanding out the \operatorname{AR}(1) equation:

\begin{align*} X_{t} & =\phi X_{t-1}+W_{t} \tag{2.25}\\ & =W_{t}+\phi W_{t-1}+\phi^{2} W_{t-2}+\ldots . \tag{2.26} \end{align*}

The equation on the second line looks like an MA process. In fact, it is an MA process of infinite order. This means that we can write:

\begin{align*} P(B) X_{t} & =W_{t} \tag{2.27}\\ X_{t} & =\psi(B) W_{t} \tag{2.28}\\ P(B)^{-1} & =\psi(B) . \tag{2.29} \end{align*}

We found that for |\phi|<1 our AR(1) process was causal. It’s natural to wonder how we can extend this notion of causality to a higher order process or an ARMA process. As we will see soon, P(B)^{-1} will be vital to this definition. It also confuses the definitions of AR and MA to say that we can write one in terms of the other. To make matter more commplicated we can go down the same rabbit hole with our MA process:

\begin{equation*} X_{t}=W_{t}+\theta_{1} W_{t-1}+\theta_{2} W_{t-2}+\ldots+\theta_{p} W_{t-p} \tag{2.30} \end{equation*}

We can also write this in terms of backshift operators and introduce the moving average operator:

Moving Average Operator: The moving average operator for an MA(p) process is defined as:

\begin{equation*} \Theta(B)=1+\theta_{1} B+\theta_{2} B^{2}+\ldots+\theta_{p} B^{p} \tag{2.31} \end{equation*}

such that:

\begin{equation*} X_{t}=\Theta(B) W_{t} \tag{2.32} \end{equation*}

As with the AR process, we can reframe our MA process as:

\begin{align*} X_{t} & =\Theta(B) W_{t} \tag{2.33}\\ \Theta(B)^{-1} X_{t} & =W_{t} \tag{2.34} \end{align*}

It turns out that just like we were concerned with causality for our AR process, we can be concerend with invertability for our MA process. Specifically, consider two MA(1) processes:

\begin{align*} X_{t} & =W_{t}+\frac{1}{5} W_{t-1}, \quad p\left(W_{t}\right)=\mathcal{N}(0,25) \tag{2.35}\\ Y_{t} & =V_{t}+5 V_{t-1}, \quad p\left(V_{t}\right)=\mathcal{N}(0,1) \tag{2.36} \end{align*}

In every statistic we can measure on X_{t} and Y_{t} the two processes are indistinguishable^{1}. The only way we would know the difference is to have access to their white noise process, and we do not have that. The only difference is that one of these two processes can be written as an infinite AR process and the other cannot. Specifically, we can write:

\begin{align*} X_{t} & =W_{t}+\theta W_{t-1} \tag{2.37}\\ \sum_{j=0}^{\infty}(-\theta)^{j} X_{t-j} & =W_{t} \tag{2.38} \end{align*}

But the term on the right-hand side is divergent if |\theta|>1. So only the model with \theta=\frac{1}{5} is invertable. We will also define this more broadly using \Theta(B)^{-1} in a moment. Before we do that, let me convince you of the value of these operators one more time. Imagine we have a simple white noise process:

\begin{equation*} X_{t}=W_{t} \tag{2.39} \end{equation*}

We can trivially also write:

\begin{align*} 0.5 X_{t-1} & =0.5 W_{t-1} \tag{2.40}\\ X_{t}-0.5 X_{t-1} & =W_{t}-0.5 W_{t-1} \tag{2.41}\\ X_{t} & =0.5 X_{t-1}+W_{t}-0.5 W_{t-1} \tag{2.42} \end{align*}

This should be alarming, because I just made a white noise process look like an ARMA( 1,1 ) process.

Let’s resolve these issues in reverse order. First, we can write any ARMA process as:

\begin{equation*} P(B) X_{t}=\Theta(B) W_{t} \tag{2.43} \end{equation*}

We will demand that an ARMA ( p, q ) process has the property that P(B) and \Theta(B) do not share any roots. That is for all complex numbers z_{i} for which P\left(z_{i}\right)=0 and for all complex numbers z_{j} for which \Theta\left(z_{j}\right)=0 there does not exist i or j for which z_{i}=z_{j}. Returning to the white noise example we made look like ARMA( 1,1 ) we have that:

\begin{align*} & P(B)=1-0.5 B \tag{2.44}\\ & \Theta(B)=1-0.5 B \tag{2.45} \end{align*}

We can simplify both equation by dividing by a common factor of 1-0.5 B (eliminating the common root of 2 ) and get:

\begin{align*} & P(B)=1 \tag{2.46}\\ & \Theta(B)=1 \tag{2.47} \end{align*}

thereby demonstrating that this process is in fact white noise. Next we can return to the invertability issue for the MA process:

Invertibility: An ARMA (p,q) process is invertible if the time series can be written as:

\begin{equation*} \pi(B) X_{t}=\sum_{j=0}^{\infty} \pi_{j} X_{t-j}=W_{t} \tag{2.48} \end{equation*}

with the infinite sum \sum_{j=0}^{\infty}\left|\pi_{j}\right|<\infty. We can determine \pi(B) as:

\begin{equation*} \pi(B)=\frac{P(B)}{\Theta(B)} . \tag{2.49} \end{equation*}

The conditions for invertibility hold so long as the roots of \Theta(z) lie outside the unit circle.

See Shumway and Stoffer for the proof of this statement.

We can make a similar definition for causality:

Causality: An ARMA ( p, q ) process is causal if the time series can be written as:

\begin{equation*} X_{t}=\psi(B) W_{t}=\sum_{j=0}^{\infty} \psi_{j} W_{t-j} \tag{2.50} \end{equation*}

with the infinite sum \sum_{j=0}^{\infty}\left|\psi_{j}\right|<\infty. We can determine \psi(B) as:

\begin{equation*} \psi(B)=\frac{\Theta(B)}{P(B)} . \tag{2.51} \end{equation*}

The conditions for causality hold so long as the roots of P(z) lie outside the unit circle.

Again, see Shumway and Stoffer for the proof of this statement.

Let’s consider the process defined by:

\begin{equation*} X_{t}=0.4 X_{t-1}+0.45 X_{t-2}+W_{t}+W_{t-1}+0.25 W_{t-2} . \tag{2.52} \end{equation*}

We want to know what order ARMA process it is and whether or not it is causal and invertable. It looks like ARMA(2,2), but looks can be deceiving. Let’s get started by finding our P(B) and \Theta(B) operators:

\begin{align*} & P(B)=1-0.4 B-0.45 B^{2} \tag{2.53}\\ & \Theta(B)=1+B+0.25 B^{2} . \tag{2.54} \end{align*}

If we factor both equation we find:

\begin{align*} & P(B)=(1+0.5 B)(1-0.9 B) \tag{2.55}\\ & \Theta(B)=(1+0.5 B)^{2} . \tag{2.56} \end{align*}

We can cancel out the shared root to get:

\begin{align*} & P(B)=(1-0.9 B) \tag{2.57}\\ & \Theta(B)=(1+0.5 B) \tag{2.58} \end{align*}

and find that our process is ARMA(1,1) and can be written as:

\begin{equation*} X_{t}=0.9 X_{t-1}+W_{t}+0.5 W_{t-1} \tag{2.59} \end{equation*}

The process is invertible and causal because both roots ( 10 / 9,-2 ) are outside the unit circle.