Chapter 9 Markov switching models

9.1 Principle

Following the definitions of autoregressive models and hidden Markov models, a natural extension is a combination of both: a time-series model where the observed variable \(y_t\) is explained by a hidden state \(x_t\) and by a regression of its own previous value \(y_{t-1}\). These models are called autoregressive hidden Markov models (AR-HMM) (Murphy (2002)) or Markov switching models (MSM).

Markov switching model

Figure 9.1: Markov switching model

Similar to an HMM, an MSM is defined by a matrix of transition probabilities \(\left(a_{ij}(t)\right) = p\left(z_t=j |z_{t-1}=i\right)\) whose terms can be conditioned on explanatory variables (time, day, weather…) and by emission probabilities. Rather than being only conditioned on \(x_t\), the emission probability can be a function of previous observations. This is an example of AR(1) process: \[\begin{equation} p(y_t | z_t=j) = \alpha_j + \phi_j y_{t-1} + w_{t,j} \tag{9.1} \end{equation}\] where the intercept \(\alpha_j\), slope \(\phi_j\) and noise \(w_{t,j}\) depend on the state \(z_t\), and may have as many different values as the number of possible states. In a more complicated example, one could implement a whole ARMAX model (Eq. ) into the observation probability of a Markov switching model.

An MSM can be trained with the same Baum-Welch algorithm and decoded with the same Viterbi algorithm as an HMM. The only difference is in the expression of the emission probabilities, which do not change the structure of the algorithms because \(y_t\) is conditionally independent on \(x_{t-1}\) given \(x_t\) and \(y_{t-1}\).

9.2 Example

I am currently working on the use of time series models for a Bayesian forecasting of building energy use. There will be a tutorial here after I get some results.


Murphy, Kevin Patrick. 2002. “Dynamic Bayesian Networks: Representation, Inference and Learning.”