Chapter 4 Ordinary linear regression

4.1 Introduction to OLR

Linear regression models are usually the first example shown in most statistical learning lectures. They are a popular introduction to statistical modelling because of their simplicity, while their structure is flexible and applicable to quite a large range of physical systems.

We consider an output variable \(y\), and a set of explanatory variables \(x=(x_1,...,x_k)\), and assume that a series of \(n\) values of \(y_i\) and \(x_{i1},...x_{ik}\) have been recorded. The ordinary linear regression model states that the distribution of \(y\) given the \(n\times k\) matrix of predictors \(X\) is normal with a mean that is a linear function of \(X\): \[\begin{equation} E(y_i|\theta, X) = \beta_1 x_{i1} + ... + \beta_k x_{ik} \tag{4.1} \end{equation}\] The parameter \(\theta\) is a vector of \(k\) coefficients which distribution is to be determined. Ordinary linear regression assumes a normal linear model in which observation errors are independent and have equal variance \(\sigma^2\). Under these assumptions, along with a uniform prior distribution on \(\theta\), the posterior distribution for \(\theta\) conditional on \(\sigma\) can be explicitely formulated: \[\begin{align} \theta | \sigma, y & \sim N\left( \hat{\theta} , V_\theta \sigma^2\right) \tag{4.2} \\ \hat{\theta} & = (X^T \, X)^{-1} X^T \, y \tag{4.3}\\ V_\theta & = (X^T \, X)^{-1} \tag{4.4} \end{align}\] along with the marginal distribution of \(\sigma^2\): \[\begin{align} \sigma^2|y & \sim \mathrm{Inv-}\chi^2(n-k, s^2 ) \tag{4.5} \\ s^2 & = \frac{1}{n-k}(y-X\hat{\theta})^T (y-X\hat{\theta}) \tag{4.6} \end{align}\]

In the words of Gelman et al. (2013) : “in the normal linear model framework, the first key statistical modelling issue is defining the variables \(x\) and \(y\), possibly using transformations, so that the conditional expectation of \(y\) is reasonably linear as a function of the columns of \(X\) with approximately normal errors.” The second main issue, related to a Bayesian analysis framework, is a proper specification of the prior distribution on the model parameters.

Despite their simplicity, linear regression models can be very useful as a first insight into the heat balance of a building: they allow a quick assessment of which types of measurements have an impact on the global balance and guide the choice of more detailed models. Moreover, if a large enough amount of data is available, the estimates of some coefficients such as the HTC often turn out to be quite reliable.

The ordinary linear regression model is enough to explain the variability of the data if the regression errors \(y_i - E(y_i|\theta, X)\) are independent, identically distributed along a normal distribution with constant variance \(\sigma^2\). If that is not the case, the model can be extended in several ways.

  • The expected value \(E(y_i|\theta, X)\) may be non-linear or include non-linear transformations of the explanatory variables.
  • Unequal variances and correlated errors can be included by allowing a data covariance matrix \(\Sigma_y\) that is not necessarily proportional to the identity matrix: \(y \sim N(X\theta, \Sigma_y)\).
  • A non-normal probability distribution can be used.

These transformations invalidate the analytical solutions shown by Eq. (4.3) to (4.6), but we will see that Bayesian inference can treat them seamlessly.

4.2 Simple linear regression with R

Example of ordinary linear regression with the standard R libraries

4.3 Bayesian linear regression with STAN

Example of ordinary linear regression with STAN


Gelman, Andrew, John B Carlin, Hal S Stern, David B Dunson, Aki Vehtari, and Donald B Rubin. 2013. Bayesian Data Analysis. CRC press.