Chapter 10 Principle of SSMs
Unless specified otherwise, the main source for the descriptions of this section is the book of Simo Särkkä: Bayesian filtering and smoothing
10.1 Description
State-space models (SSM) are the continuous-state equivalent of hidden Markov models: the sequence of \(T\) output variables \({y_t}\) is generated by a parallel sequence of \(n_x\) latent continuous state variables \(x_t \in \mathbb{R}^{n_x}\). Since the letter \(x\) is used here to denote the state variable, we denote the explanatory inputs as \(u_t \in \mathbb{R}^{n_u}\). A Bayesian SSM is fully defined by four distributions: a prior distribution of the parameters \(\theta\); a prior distribution of the first hidden state \(\mathbf{x}_0\); a dynamic model which describes the system dynamics and its uncertainties in terms of the transition probability distribution; a measurement model which describes how the measurements depend on the current state: \[\begin{align} \theta & \sim p(\theta) \nonumber \\ \mathbf{x}_0 & \sim p(\mathbf{x}_0) \nonumber \\ \mathbf{x}_t & \sim p(\mathbf{x}_t | \mathbf{x}_{t-1}, \theta) \nonumber \\ \mathbf{y}_t & \sim p(\mathbf{y}_t | \mathbf{x}_t, \theta) \tag{10.1} \end{align}\]
Given the measurements \(\mathbf{y}_{1:T}\), we are then interested in computing:
- Filtering distributions, i.e. the marginal distribution of each state given the current and previous measurements: \[ p(\mathbf{x}_t | \mathbf{y}_{1:t}, \theta), \quad t = 1, \dots ,T \]
- Prediction distributions, the marginal distributions of future states: \[ p(\mathbf{x}_{t+k} | \mathbf{y}_{1:t}, \theta), \quad t = 1, \dots ,T, \quad k=1,2,\dots \]
- Smoothing distributions, the marginal distributions of each state given all measurements: \[p(\mathbf{x}_t | \mathbf{y}_{1:T}, \theta), \quad t = 1, \dots ,T \]
- Parameter estimation, i.e. the posterior density of parameters given all measurements: \[ p(\theta | \mathbf{y}_{1:T}) \]
- On-line parameter estimation, i.e. the marginal distribution of parameters given the current and previous measurements: \[ p(\theta | \mathbf{y}_{1:t}), \quad t = 1, \dots ,T \]
Filtering, prediction and smoothing are variations of state estimation. Filtering and smoothing a linear Gaussian state-space model are respectively done with the Kalman filter (KF) and the Rauch-Tung-Striebel smoother (RTSS). These methods are not applicable to non-linear and non-Gaussian models, but other alternatives are: extended Kalman filter (EKF) and unscented Kalman filter (UKF), along with their smoother counterparts (ERTSS and URTSS); sequential Monte Carlo, or particle filters and smoothers; etc.
Parameter estimation is meant here as the computation of \(p(\theta | \mathbf{y}_{1:T})\), the marginal posterior of \(\theta\) given all measurements. The way to do this is to perform filtering, which produces the likelihood function \(p(\mathbf{y}_{1:T} | \theta)\) as a side product. The Maximum Likelihood estimate of \(\theta\) is then obtainable by numerical maximization of the likelihood, otherwise the full posterior density can be approached by one of the Markov Chain Monte Carlo (MCMC) methods. As opposed to batch estimation, on-line estimation means computing \(p(\theta | \mathbf{y}_{1:t})\) at every time step \(t\), i.e. the probability distribution of \(\theta\) given the current and previous measurement.
10.2 Linear state-space models
In the linear Gaussian state-space formulation, both the dynamic and the measurement models are linear with respect to the state \(\mathbf{x}_t\) and the inputs \(\mathbf{u}_t\): \[\begin{align} \mathbf{x}_t & = \mathbf{A}_\theta \, \mathbf{x}_{t-1} + \mathbf{B}_\theta \, \mathbf{u}_t + \mathbf{w}_t \tag{10.2} \\ \mathbf{y}_t & = \mathbf{C}_\theta \, \mathbf{x}_t + \mathbf{D}_\theta \, \mathbf{u}_t + \mathbf{v}_t \tag{10.3} \end{align}\] where the matrices \(\mathbf{A}_\theta\), \(\mathbf{B}_\theta\), \(\mathbf{C}_\theta\) and \(\mathbf{D}_\theta\) have coefficients as functions of \(\theta\) and are not necessarily time-invariant. \(\mathbf{w}_t \sim N\left(0, \mathbf{Q}\right)\) is the process noise and \(\mathbf{v}_t \sim N\left(0, \mathbf{R}\right)\) is the measurement noise.
As a practical example of linear state-space models, the simplified resistor-capacitor (RC) model structures are a popular choice for either parameter estimation or system identification. When written as a set of stochastic differential equations, they allow accounting for modelling approximations (Madsen and Holst (1995)) and offer a more reproducible parameter estimation than deterministic models that overlook modelling errors (Rouchier, Rabouille, and Oberlé (2018)). These models will be used in the next chapter for the estimation of thermal properties of building envelopes.
Consider the example of a simple building represented by a 2-resistor, 2-capacitor model structure (2R2C) as shown here. The equations of this model are:
\[\begin{align} C_i \, \mathrm{d} T_i & = \dfrac{1}{R_i}\left(T_e-T_i\right)\mathrm{d}t + \Phi_h \, \mathrm{d}t + A_i \Phi_s \mathrm{d}t + \sigma_i \,\mathrm{d}\omega_i \tag{10.4} \\ C_e \, \mathrm{d} T_e & = \dfrac{1}{R_i}\left(T_i-T_e\right)\mathrm{d}t + \frac{1}{R_o}\left(T_o-T_e\right)\mathrm{d}t + A_e \Phi_s \mathrm{d}t + \sigma_e \, \mathrm{d}\omega_e \tag{10.5} \end{align}\]
where \(T_i\), \(T_e\) and \(T_o\) are the indoor, envelope and outdoor temperatures. The envelope temperature is associated with the thermal mass of the opaque surfaces, and does not represent a specific coordinate within the envelope. The model has two states \(T_e\) (unobserved) and \(T_i\) (observed); \(\Phi_h\) (W) is the indoor heating power; \(\Phi_s\) (W/m\(^2\)) is the global horizontal solar irradiance. \(R_i\) (K/W) is the thermal resistance between the indoor air temperature and the envelope, \(R_e\) the resistance between the envelope and the ambient air. \(C_i\) and \(C_e\) (J/K) are the heat capacitances of the interior and the envelope, respectively, and \(A_i\) and \(A_e\) (m\(^2\)) are their solar gain coefficients. \(\{\omega_i\}\) and \(\{\omega_e\}\) are standard Wiener processes and \(\sigma_i^2\) are \(\sigma_e^2\) are their variances. This process noise is a way to account for modelling approximations, unrecognized inputs or noise-corrupted input measurements.
Eq. (10.4) and (10.5) can be written in matrix form: \[\begin{equation} \mathrm{d} \begin{bmatrix} T_i \\ T_e \end{bmatrix} = \begin{pmatrix} -\frac{1}{R_i \, C_i} & \frac{1}{R_i \, C_i} \\ \frac{1}{R_i \, C_e} & -\frac{1}{R_i \, C_e}-\frac{1}{R_e \, C_e}\end{pmatrix} \begin{bmatrix} T_i \\ T_e \end{bmatrix}\, \mathrm{d}t + \begin{pmatrix} 0 & \frac{1}{C_i} & \frac{A_i}{C_i} \\ \frac{1}{R_e \, C_e} & 0 & \frac{A_e}{C_e} \end{pmatrix} \begin{bmatrix} T_o \\ \Phi_h \\ \Phi_s \end{bmatrix} \, \mathrm{d}t + \mathbf{\sigma} \, \mathrm{d}\omega \tag{10.6} \end{equation}\] which is the dynamic model of the following stochastic state-space model, written in continuous-discrete form: \[\begin{align} \mathrm{d}\mathbf{x}(t) & = \mathbf{A}_\mathit{rc} \, \mathbf{x}(t) \, \mathrm{d}t + \mathbf{B}_\mathit{rc} \, \mathbf{u}(t)\,\mathrm{d}t + \mathbf{\sigma}_\theta \mathrm{d}\omega \tag{10.7} \\ \mathbf{y}_t & = \mathbf{C}_\theta \, \mathbf{x}_t + \mathbf{v}_t \tag{10.8} \end{align}\] The state vector \(\mathbf{x}\) includes the temperatures \(T_i\) and \(T_e\) calculated by the model, and \(\mathbf{u}=\left[T_o, \Phi_h, \Phi_s\right]\) is the input vector including boundary conditions and excitations. The second equation, Eq. (10.8), is the observation equation. It indicates that the measured quantity \(y_t\) may be different from the output of the state equation. In our case, the observed temperature is only the first component of the state vector, and is encumbered with some measurement error \(\mathbf{v}_t\). In this equation, time is noted as a subscript to indicate that observations come in a discrete sequence.
This model described by Eq. (10.7) must be discretized in order to specify its evolution between discrete time coordinates. Supposing a sample interval length \(\Delta t\) and assume that the inputs \(\mathbf{u}(t)\) are constant during each interval. Eq. (10.7) and (10.8) can be discretized into the system of Eq. (10.2) and (10.3) through the following discretization equations: \[\begin{align} \mathbf{A}_\theta & = \mathrm{exp}\left( \mathbf{A}_\mathit{rc} \, \Delta t \right) \tag{10.9}\\ \mathbf{B}_\theta & = \mathbf{A}_\mathit{rc}^{-1} \, \left(\mathbf{A}_\theta-\mathbf{I}\right) \, \mathbf{B}_\mathit{rc} \tag{10.10} \\ \mathbf{Q} & = \int_0^{\Delta t} \mathrm{exp}\left( \mathbf{A}_\mathit{rc} \, \Delta t \right) \, \mathbf{\sigma}_\theta \, \mathrm{exp}\left( \mathbf{A}_\mathit{rc}^T \, \Delta t \right) \mathrm{d}t \tag{10.11} \end{align}\]
The Kalman filter is the typical method for computing the marginal distribution of each state \(p(\mathbf{x}_t | \mathbf{y}_{1:t}, \theta)\) and the likelihood function \(p(\mathbf{y}_{1:t}| \theta)\) from which \(\theta\) can be estimated.
10.3 The Kalman filter
Given a state transition probability \(p\left( \mathbf{x}_{t} | \theta, \mathbf{x}_{t-1}, \mathbf{u}_{t} \right)\) (Eq. (10.2)) and an observation probability \(p\left( \mathbf{y}_{t} | \mathbf{x}_{t}\right)\) (Eq. (10.3)), a Kalman filter produces \(p\left(\mathbf{x}_t|\mathbf{y}_{1:T}, \theta \right)\), the probability distribution function of each state \(\mathbf{x}_t\) given measurements and parameter values, and the marginal likelihood function \(L_y(\theta)=p\left(\mathbf{y}_{1:T} | \theta \right)\).
Filtering produces \(p\left(\mathbf{x}_t|\mathbf{y}_{1:N}, \theta \right)\), the probability distribution function of each state \(\mathbf{x}_t\) given measurements and parameter values. In the following, definitions adapted from the book of Shumway et al (Shumway and Stoffer (2000)) are used: \(\mathbf{x}_{t|s}\) is the expected state at time \(t\) given observations up to time \(s\). \(\mathbf{P}_{t|s}\) is the variance of the state \(\mathbf{x}_{t}\), i.e. the mean-squared error. \[\begin{align} \mathbf{x}_{t|s} & = \mathrm{E}\left(\mathbf{x}_t|\mathbf{y}_{1:s}, \theta \right) \\ \mathbf{P}_{t|s} & = \mathrm{Var}\left(\mathbf{x}_t|\mathbf{y}_{1:s, \theta}\right)= \mathrm{E}\left[(\mathbf{x}_t-\mathbf{x}_{t|s})(\mathbf{x}_t-\mathbf{x}_{t|s})^T|\mathbf{y}_{1:s}, \theta \right] \end{align}\]
The Kalman filter algorithm is described here and illustrated by Fig. 10.2:
- Set the initial states \(\mathbf{x}_{0|0}\) and their covariance \(\mathbf{P}_{0|0}\)
- for \(t=1...T\):
- Prediction step: given the previous state \(\mathbf{x}_{t|t}\) and its covariance \(\mathbf{P}_{t|t}\), the model estimates the one-step ahead prediction. \[\begin{align} \mathbf{x}_{t+1|t} & = \mathbf{A}_\theta \, \mathbf{x}_{t|t} + \mathbf{B}_\theta \, \mathbf{u}_{t+1}\\ \mathbf{P}_{t+1|t} & = \mathbf{A}_\theta \, \mathbf{P}_{t|t} \, \mathbf{A}_\theta^T + \mathbf{Q} \end{align}\]
- Innovations (prediction error) \(\varepsilon_{t+1}\) and their covariances \(\Sigma_{t+1}\) are then calculated, along with the Kalman gain \(\mathbf{K}_{t+1}\), by comparing measurements \(\mathbf{y}_{t+1}\) with the one-step ahead prediction \(\mathbf{x}_{t+1|t}\): \[\begin{align} \varepsilon_{t+1} & = \mathbf{y}_{t+1} - \mathbf{C}_\theta \, \mathbf{x}_{t+1|t}\\ \Sigma_{t+1} & = \mathbf{C}_\theta \, \mathbf{P}_{t+1|t} \, \mathbf{C}_\theta^T + \mathbf{R} \\ \mathbf{K}_{t+1} & = \mathbf{P}_{t+1|t} \, \mathbf{C}_\theta^T \, \Sigma_{t+1}^{-1} \end{align}\]
- Updating step: the new states at time \(t+1\) are updated, as a compromise between the one-step ahead prediction and the measurement. \[\begin{align} \mathbf{x}_{t+1|t+1} & = \mathbf{x}_{t+1|t} + \mathbf{K}_{t+1} \, \mathbf{\varepsilon}_{t+1} \\ \mathbf{P}_{t+1|t+1} & = \left( \mathbf{I}- \mathbf{K}_{t+1} \, \mathbf{C}_\theta \right) \, \mathbf{P}_{t+1|t} \end{align}\]
- The total (negative) log-likelihood can be calculated up to a normalizing constant: \[\begin{equation} -\ln L_y(\theta) = \frac{1}{2} \sum_{t=1}^{T} \ln \left|\Sigma_t(\theta)\right| + \frac{1}{2} \sum_{t=1}^{T} \varepsilon_t(\theta)^T \, \Sigma_t(\theta)^{-1} \, \varepsilon_t(\theta) \end{equation}\]
Roughly speaking, the Kalman filter applies Bayes’ rule at each time step: the updated state \(p(\mathbf{x}_t|\mathbf{y}_{1:t})=N(\mathbf{x}_{t|t}, \mathbf{P}_{t|t})\) is a posterior distribution, obtained from a compromise between a prior output of the model \(p(\mathbf{x}_t|\mathbf{y}_{1:t-1})=N(\mathbf{x}_{t|t-1}, \mathbf{P}_{t|t-1})\) and the evidence brought by measurements \(\mathbf{y}_t\). Their relative weight is expressed by the Kalman gain \(\mathbf{K}_t\) that measures the relative confidence we put in both the model and the measurements.
10.4 Non-linear state-space models
State-space models are not necessarily linear or Gaussian. In a more generic situation, the system of equations (10.1) can take the form: \[\begin{align} \mathbf{x}_t & = F(\mathbf{x}_{t-1}, \mathbf{u}_t, \mathbf{w}_t) \tag{10.12} \\ \mathbf{y}_t & = G(\mathbf{x}_t, \mathbf{u}_t, \mathbf{v}_t) \tag{10.13} \end{align}\]
The most common extensions of the Kalman filter to non-linear systems are the extended Kalman filter (EKF) and unscented Kalman filter (UKF). The EKF approximates the non-linear and non-Gaussian measurement and dynamic models by linearization at the nominal solution. The unscented Kalman filter (UKF) approximates the propagation of densities through the non-linearities of measurement and noise processes using the unscented transform. They are both Gaussian approximations.
Non-linearity in a building energy state-space model can arise in different situations:
- The states can include not only the temperatures simulated by an RC model. By using fictitious process equations to augment the state vector with model parameters, it is possible to perform on-line joint state-parameter estimation.
- To encode time-varying uncertainty in some of the model inputs: states can be augmented with additional state variables describing these varying unknown parameters.
For non-linear non-Gaussian state-space models, particle filters, or sequential Monte Carlo (SMC) methods, are now a popular alternative to the EKF because of their suitability for parallel implementation and for on-line inference (Cappé, Godsill, and Moulines (2007)). SMC filters and smoothers represent the posterior distribution of states as a weighted set of Monte Carlo samples. They are applicable to joint state-parameter estimation and have already been applied to the characterization of building envelope properties (Rouchier, Jiménez, and Castaño (2019)).
In a non-Bayesian framework, particle methods can approximate the likelihood function which may then be maximized for an off-line or on-line ML estimation of parameters (Kantas et al. (2015)). In Bayesian joint state-parameter estimation, the target distribution \(p(\theta, x_{1:T} | y_{1:T})\) can be approached in several ways: Particle MCMC methods (Andrieu, Doucet, and Holenstein (2010)) sample from \(\theta\) with a MCMC method, within which particle filters are called to approximate the marginal state distributions; the SMC\(^2\) algorithm (Chopin, Jacob, and Papaspiliopoulos (2013)) couples a SMC algorithm in the \(\theta\)-dimension, which propagates and resamples many particle filters in the \(x\)-dimension.
10.5 Switching state-space models
The last form of statistical model we will mention here are the state-space models with regime switching, or switching state-space models (SSSM), or switching dynamical systems. They are state-space models containing both categorical and continuous hidden variables, and thus a sort of coalition between hidden Markov models and state-space models.
One of the forms of SSSMs is shown here, where the continuous state variable \(X\) is conditioned on a categorical latent variable \(S\) denoted the switch. Like in an HMM or MSM, the switch refers to one of several possible regimes the system can be in. The solutions of such models with a fixed number of modes of operation can be approximated by several methods
- Generalized pseudo-Bayesian methods, and the interacting multiple model (IMM) algorithm (Blom and Bar-Shalom (1988)), based on forming a mixture of Kalman filters;
- Expectation propagation (Zoeter and Heskes (2011));
- Rao-Blackwellised particle filters (Doucet et al. (2000)) use closed form integration (Kalman filters) for some of the state variables and Monte Carlo integration for others (Särkkä (2013)).