# Chapter 8 Hidden Markov models

## 8.1 Principles

In a hidden Markov model (HMM), the sequence of $$T$$ output variables $${y_t}$$ is generated by a parallel sequence of latent categorical state variables $$z_t \in \left\{ 1,\cdots , K\right\}$$. These hidden states are a Markov chain, so that $$z_t$$ is conditionally independent of other variables given $$z_{t-1}$$.

The primary use of HMMs in building energy simulation is for identifying and modelling occupancy (Candanedo, Feldheim, and Deramaix (2017)). The hidden variable $$z_t$$ may denote a binary state of occupancy or a finite number of occupants; the observed output $$y_t$$ may denote any measurement impacted by occupancy: environmental sensors, smart meters, cameras, infrared sensors… There is a vast literature on occupancy detection and estimation from different types of data and methods. The reader is referred to the review of Chen et al (Chen, Jiang, and Xie (2018)) for a more comprehensive insight.

A HMM is defined by:

• A sequence of hidden states $$\left\{z_t\right\} \in \mathbb{N}^T$$, each of which can take a finite number of values: $$z_t \in \left[1,...,K\right]$$
• An observed variable $$\left\{y_t\right\} \in \mathbb{R}^T$$
• An initial probability $$\pi_0$$ which is the likelihood of the state at time $$t=0$$. $$\pi_0$$ is a $$K$$-simplex, i.e. a $$K$$-vector which components sum to 1.
• A $$K\times K$$ one-step transition probability matrix $$\mathbf{A}(t) = \left(a_{ij}(t)\right)$$ so that $$$a_{ij}(t)=p\left(z_t=j |z_{t-1}=i\right) \tag{8.1}$$$ is the probability at time $$t$$ for the hidden variable to switch from state $$i$$ to state $$j$$. Like $$\pi_0$$, each row of $$\mathbf{A}(t)$$ must sum to 1.
• Emission probabilities $$b_{i}(y_t) = p(y_t | z_t=i)$$, i.e. the probability distribution of the output variable given each possible state.

The transition probabilities $$\left(a_{ij}(t)\right)$$ are shown here as functions of time because they can be formulated as parametric expressions of external observed variables, such as the time of the day or weather variables. The Markov chain is then called inhomogeneous. Inhomogeneous transition probability matrices can capture occupancy properties at different time instances and thus encode occupancy dynamics: there can a higher probability of people entering the building at a certain time of the day, or if the outdoor temperature is high, etc.

Training an HMM means finding the set of parameters $$\theta$$ which best explain the observations. This can be done by in least two ways: the first option is to compute the likelihood function with the forward algorithm, explained below, and to perform its direct numerical maximization. The second option is the Baum-Welch algorithm, a special case of the expectation-maximization (EM) algorithm: it alternates between a forward-backward algorithm for the expectation step, and an updating phase for the maximization step.

A trained HMM can then be used to predict states and future values of the outcome variable. The estimation of the most likely state sequence given the observations is called decoding. $$$z^*_{0:T} = \mathrm{arg max}_z \; p(z_{0:T} | y_{0:T}) \tag{8.2}$$$ and can be done by the Viterbi algorithm described below.

### 8.1.1 The forward algorithm

The forward algorithm computes the likelihood function $$p(y_{1:T}|\theta)$$ of a hidden Markov model, given the values of its parameters $$\theta$$. These parameters come from the parametric expressions of $$\pi_0$$, $$a_{ij}(t)$$ and $$b_{i}$$.

The algorithm works by calculating the components of a $$[T \times K]$$ matrix $$\alpha$$ of forward variables defined by: $$$\alpha_{ij} = p(y_{0:i},z_i=j) \tag{8.3}$$$ The forward variable $$\alpha_{ij}$$ is the joint probability of observations up to time $$i$$, and of the current hidden variable $$z_i$$ being in state $$j$$. The $$\alpha$$ matrix is filled row by row, and the total likelihood of the data is the sum of its last row.

### 8.1.2 The Viterbi algorithm

Supposing a trained HMM with known parameters in the initial probability $$\pi_0$$, transition matrix $$\mathbf{A}(t)$$ and emission probabilities $$\mathrm{b}(y)$$; the Viterbi algorithm finds the best sequence of hidden states, so that their probability conditioned on the data $$y_{0:T}$$ is at a maximum: $$$z^*_{0:T} = \mathrm{arg max}_z \; p(z_{0:T} | y_{0:T}) \tag{8.4}$$$ This process is called global decoding, and one of the possible inferences from HMMs.

The Viterbi algorithm looks similar to the backward algorithm, with an additional backtracking phase. It produces two $$[T \times K]$$ matrices $$\delta$$ and $$\psi$$ and fills them row by row in a forward phase. Then, a backtracking phase computes $$z^*_{0:T}$$ from the $$\psi$$ matrix. The algorithm is described with more detail here.

## 8.2 Example

I am currently working on the use of time series models for a Bayesian forecasting of building energy use. There will be a tutorial here after I get some results. In the meantime, hidden Markov models are also covered in the Stan user’s guide.

### References

Candanedo, Luis M, Véronique Feldheim, and Dominique Deramaix. 2017. “A Methodology Based on Hidden Markov Models for Occupancy Detection and a Case Study in a Low Energy Residential Building.” Energy and Buildings 148: 327–41.

Chen, Zhenghua, Chaoyang Jiang, and Lihua Xie. 2018. “Building Occupancy Estimation and Detection: A Review.” Energy and Buildings 169: 260–70.