Linear Mixed Models

In the last section we considered models with correlated responses of the form \(\pmb{y=X\beta+\epsilon}\), \(E[\pmb{\epsilon}]=0\) and \(cov(\pmb{\epsilon})=\sigma^2\pmb{D}\), where \(\pmb{D}\) was assumed to be a known covariance matrix. In practice we often encounter cases where \(\pmb{D}\) is not known but has to be estimated. Unfortunately \(\pmb{D}\) has \(n\choose 2\) elements, and so there nowhere near enough equations to estimate it. However, in some cases one knows something about the structure of \(\pmb{D}\), which lowers the number of unknowns sufficiently to allow estimation.

Dependencies between observations can arise in many ways. For example, if observations are taken on meteorological phenomena (temperature, pressure etc) each day, measurements from one day to the next are likely correlated. In surveys of people those living in the same geographical area (maybe even in the same household) are likely correlated.

A model for experiments of this kind can often be written in the form

\[\pmb{y=X\beta+Z_1a_1+..+Z_ma_m+\epsilon}\]

where \(E[\pmb{\epsilon}]=0\) and \(cov(\pmb{\epsilon})=\sigma^2\pmb{I}\). Here \(\pmb{X}\) is a (possibly less than full-rank) \(n\times p\) known matrix of fixed predictors, just as before. The \(\pmb{Z}_i\)’s are known \(n\times r_i\) full-rank matrices. The \(\pmb{a}_i\)’s are vectors of unknown quantities, similar to \(\pmb{\epsilon}\). We assume \(E[\pmb{a}_i]=0\), \(cov(\pmb{a}_i)=\sigma^2_i\pmb{I}\) and \(cov(\pmb{a}_i, \pmb{a}_j)=\pmb{O}\) for \(i\ne j\).

Compare this model to the random X model discussed in section 6.10. There it was the matrix \(\pmb{X}\) that was assumed to be random and the vector \(\pmb{\beta}\) was fixed. This model however is fairly close to the Bayesian model, where \(\pmb{\beta}\) is also assumed to be random.

Models of this kind are (for obvious reasons) called mixed linear models. If \(\pmb{X=j}\) is sometimes called a random model. The \(\sigma_i^2\)’s are called variance components.

Theorem (8.3.1)

Consider the model \(\pmb{y=X\beta+Z_1a_1+..+Z_ma_m+\epsilon}\), with elements as described above. Then \(E[\pmb{y}]=\pmb{X\beta}\) and

\[cov(\pmb{y})=\pmb{\Sigma}=\sum_{i=1}^m \sigma_i^2 \pmb{Z_i'Z_i}+\sigma^2\pmb{I}\]

proof

\[ \begin{aligned} &E[\pmb{y}] = E[\pmb{X\beta+Z_1a_1+..+Z_ma_m+\epsilon}] = \\ &E[\pmb{X\beta}]+\pmb{Z_1}E[\pmb{a_1}]+..+\pmb{Z_m}E[\pmb{a_m}]+E[\pmb{\epsilon}] = \pmb{X\beta} \end{aligned} \] \[ \begin{aligned} &cov(\pmb{y}) = cov(\pmb{X\beta+Z_1a_2+..+Z_ma_m+\epsilon}) = \\ &cov(\pmb{Z_1a_1})+..+cov(\pmb{Z_ma_m})+cov(\pmb{\epsilon}) = \\ &\sigma_1^2\pmb{Z_1'Z_1}+..+\sigma_m^2\pmb{Z_m'Z_m}+\sigma^2 \end{aligned} \]

because \(cov(\pmb{Z_ia_i},\pmb{Z_ja_j}) = \pmb{Z_i}'cov(\pmb{a_i,a_j})\pmb{Z_j}= \pmb{Z_i}'\pmb{Z_i}\) if i=j and 0 otherwise.

Example (8.3.2)

Randomized Block design

We carry out a study on the effectiveness of a new drug. We have three treatments: no drug, placebo and new drug. We randomly choose 4 hospitals (out of a much larger list of possible hospitals) where the study will take place. If we average over the patients for each factor level combination a model would be

\[y_{ij}=\mu+\tau_i+\alpha_j+\epsilon_{ij}\] with i=1,2,3;j=1,2,3,4; \(a_j\sim N(0,\sigma^2_j)\); \(\epsilon_{ij}\sim N(0,\sigma^2)\) and \(cov(a_j,\epsilon_{ij})=0\). where \(\mu\) is the overall effect, \(\tau_i\) is the effect of the three treatments, which are fixed, and \(\alpha_i\) is the effect of the choice of hospital, which is random. Note that we did not include an interaction term, which is assumed not to exist.

So we have

\[ \pmb{X} = \begin{pmatrix} \pmb{j}_3 & \pmb{I}_3 \\ \pmb{j}_3 & \pmb{I}_3 \\ \pmb{j}_3 & \pmb{I}_3 \\ \pmb{j}_3 & \pmb{I}_3 \\ \end{pmatrix} \]

\[ \pmb{Z}_1 = \begin{pmatrix} \pmb{j}_3 & \pmb{0}_3& \pmb{0}_3& \pmb{0}_3 \\ \pmb{0}_3 & \pmb{j}_3& \pmb{0}_3& \pmb{0}_3 \\ \pmb{0}_3 & \pmb{0}_3& \pmb{j}_3& \pmb{0}_3 \\ \pmb{0}_3 & \pmb{0}_3& \pmb{0}_3& \pmb{j}_3 \\ \end{pmatrix} \]

Let’s use R to find \(\pmb{\sigma} = \sigma^2\pmb{I}+\sigma_1^2\pmb{Z_1'Z_1}\):

I=3;J=4
sig=2.5;sig1=1.9
Z=matrix(0, I*J, J)
for(j in 1:J) Z[(j-1)*I+1:I, j]=1
sig*diag(I*J)+sig1*Z%*%t(Z)
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
##  [1,]  4.4  1.9  1.9  0.0  0.0  0.0  0.0  0.0  0.0   0.0   0.0   0.0
##  [2,]  1.9  4.4  1.9  0.0  0.0  0.0  0.0  0.0  0.0   0.0   0.0   0.0
##  [3,]  1.9  1.9  4.4  0.0  0.0  0.0  0.0  0.0  0.0   0.0   0.0   0.0
##  [4,]  0.0  0.0  0.0  4.4  1.9  1.9  0.0  0.0  0.0   0.0   0.0   0.0
##  [5,]  0.0  0.0  0.0  1.9  4.4  1.9  0.0  0.0  0.0   0.0   0.0   0.0
##  [6,]  0.0  0.0  0.0  1.9  1.9  4.4  0.0  0.0  0.0   0.0   0.0   0.0
##  [7,]  0.0  0.0  0.0  0.0  0.0  0.0  4.4  1.9  1.9   0.0   0.0   0.0
##  [8,]  0.0  0.0  0.0  0.0  0.0  0.0  1.9  4.4  1.9   0.0   0.0   0.0
##  [9,]  0.0  0.0  0.0  0.0  0.0  0.0  1.9  1.9  4.4   0.0   0.0   0.0
## [10,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   4.4   1.9   1.9
## [11,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   1.9   4.4   1.9
## [12,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   1.9   1.9   4.4

and so

\[ \pmb{\sigma} = \sigma^2\pmb{I}+\sigma_1^2\pmb{Z_1'Z_1} = \\ \begin{pmatrix} \pmb{\Sigma_1} & 0 & 0 & 0 & \\ 0 &\pmb{\Sigma_1} & 0 & 0 \\ 0 &0 &\pmb{\Sigma_1} & 0 \\ 0 &0 &0 &\pmb{\Sigma_1} \\ \end{pmatrix} \]

where

\[ \pmb{\Sigma}_1 = \begin{pmatrix} \sigma_1^2+\sigma^2 &\sigma_1^2 & \sigma_1^2 \\ \sigma_1^2 & \sigma_2^2+\sigma^2 & \sigma_1^2\\ \sigma_1^2 & \sigma_1^2 & \sigma_1^2+\sigma^2\\ \end{pmatrix} \]

Example (8.3.3)

Subsampling

Five batches were produced using each of two processes. Two samples were obtained and measured from each of the batches. Constraining the process effects to sum to zero, the model is

\[y_{ijk}=\mu+\tau_i+\alpha_{ij}+\epsilon_{ijk}\]

with i=1,2;j=1,..,5;k=1,2; \(\tau_1+\tau_2=0\), \(a_{ij}\sim N(0,\sigma^2_1)\); \(\epsilon_{ijk}\sim N(0,\sigma^2)\) and \(cov(a_{ij},\epsilon_{ij})=0\).

\[ \pmb{X} = \begin{pmatrix} \pmb{j}_{10} & \pmb{j}_{10} \\ \pmb{j}_{10} & \pmb{j}_{10} \\ \end{pmatrix} \]

\[ \pmb{Z}_1 = \begin{pmatrix} \pmb{j}_2 & \pmb{0}_2& ... & \pmb{0}_2 \\ \pmb{0}_2 & \pmb{j}_2& ...& \pmb{0}_2 \\ \vdots & \vdots& & \vdots \\ \pmb{0}_2 & \pmb{0}_2& ...& \pmb{j}_2 \\ \end{pmatrix} \]

and again we can find

\[ \pmb{\sigma} = \sigma^2\pmb{I}+\sigma_1^2\pmb{Z_1'Z_1} = \\ \begin{pmatrix} \pmb{\Sigma_1} & 0 & ... & 0 & \\ 0 &\pmb{\Sigma_1} & ... & 0 \\ \vdots &\vdots & & \vdots \\ 0 &0 &... &\pmb{\Sigma_1} \\ \end{pmatrix} \]

where

\[ \pmb{\Sigma}_1 = \begin{pmatrix} \sigma_1^2+\sigma^2 &\sigma_1^2 \\ \sigma_1^2 & \sigma_1^2+\sigma^2\\ \end{pmatrix} \]

Example (8.3.4)

Split-Plot Studies

A \(3\times 2\) factorial experiment (with factors A and B, respectively) was carried out using six main units, each of which was subdivided into two subunits. The levels of A were each randomly assigned to two of the main units, and the levels of B were randomly assigned to subunits within main units.

An appropriate model is

\[y_{ijk}=\mu+\tau_i+\delta_j+\theta_{ij}+\alpha_{ik}+\epsilon_{ijk}\]

with i=1,2,3;j=1,2;k=1,2; \(a_{ij}\sim N(0,\sigma^2_1)\); \(\epsilon_{ijk}\sim N(0,\sigma^2)\) and \(cov(a_{ij},\epsilon_{ij})=0\).

Here are the matrices, created with R:

X=matrix(0, I*J, J)
for(j in 1:J) X[(j-1)*I+1:I, j]=1
tmp=rbind(cbind(diag(2),diag(2), 0*diag(2), 0*diag(2)), 
      cbind(diag(2),diag(2), 0*diag(2), 0*diag(2)),
      cbind(diag(2),0*diag(2), diag(2), 0*diag(2)),
      cbind(diag(2),0*diag(2), diag(2), 0*diag(2)),
      cbind(diag(2),0*diag(2), 0*diag(2), diag(2)),
      cbind(diag(2),0*diag(2), 0*diag(2), diag(2)))
X=cbind(1, X, tmp)
X
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
##  [1,]    1    1    0    0    0    1    0    1    0     0     0     0     0
##  [2,]    1    1    0    0    0    0    1    0    1     0     0     0     0
##  [3,]    1    1    0    0    0    1    0    1    0     0     0     0     0
##  [4,]    1    0    1    0    0    0    1    0    1     0     0     0     0
##  [5,]    1    0    1    0    0    1    0    0    0     1     0     0     0
##  [6,]    1    0    1    0    0    0    1    0    0     0     1     0     0
##  [7,]    1    0    0    1    0    1    0    0    0     1     0     0     0
##  [8,]    1    0    0    1    0    0    1    0    0     0     1     0     0
##  [9,]    1    0    0    1    0    1    0    0    0     0     0     1     0
## [10,]    1    0    0    0    1    0    1    0    0     0     0     0     1
## [11,]    1    0    0    0    1    1    0    0    0     0     0     1     0
## [12,]    1    0    0    0    1    0    1    0    0     0     0     0     1
I=3;J=2;K=2
Z=matrix(0, I*J*K, I*J)
j=matrix(1, 2, 1)
o=matrix(0, 2, 1)
Z=rbind(cbind(j, o, o, o , o, o),
        cbind(o, j, o, o , o, o),
        cbind(o, o, j, o , o, o),
        cbind(o, o, o, j , o, o),
        cbind(o, o, o, o , j, o),
        cbind(o, o, o, o , o, j))
Z
##       [,1] [,2] [,3] [,4] [,5] [,6]
##  [1,]    1    0    0    0    0    0
##  [2,]    1    0    0    0    0    0
##  [3,]    0    1    0    0    0    0
##  [4,]    0    1    0    0    0    0
##  [5,]    0    0    1    0    0    0
##  [6,]    0    0    1    0    0    0
##  [7,]    0    0    0    1    0    0
##  [8,]    0    0    0    1    0    0
##  [9,]    0    0    0    0    1    0
## [10,]    0    0    0    0    1    0
## [11,]    0    0    0    0    0    1
## [12,]    0    0    0    0    0    1
sig=2.5;sig1=1.9
sig*diag(I*J*K)+sig1*Z%*%t(Z)
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
##  [1,]  4.4  1.9  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0.0   0.0   0.0
##  [2,]  1.9  4.4  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0.0   0.0   0.0
##  [3,]  0.0  0.0  4.4  1.9  0.0  0.0  0.0  0.0  0.0   0.0   0.0   0.0
##  [4,]  0.0  0.0  1.9  4.4  0.0  0.0  0.0  0.0  0.0   0.0   0.0   0.0
##  [5,]  0.0  0.0  0.0  0.0  4.4  1.9  0.0  0.0  0.0   0.0   0.0   0.0
##  [6,]  0.0  0.0  0.0  0.0  1.9  4.4  0.0  0.0  0.0   0.0   0.0   0.0
##  [7,]  0.0  0.0  0.0  0.0  0.0  0.0  4.4  1.9  0.0   0.0   0.0   0.0
##  [8,]  0.0  0.0  0.0  0.0  0.0  0.0  1.9  4.4  0.0   0.0   0.0   0.0
##  [9,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  4.4   1.9   0.0   0.0
## [10,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  1.9   4.4   0.0   0.0
## [11,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0.0   4.4   1.9
## [12,]  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0   0.0   1.9   4.4

and so we find \(\pmb{\Sigma}\) to be same as in the previous example.

Example (8.3.5)

One-Way Random Effects Model

Let’s analyze a one-way model where the factor is random. So we have \(\pmb{y=\mu+\alpha_i+\epsilon_{ij}}\), \(\alpha_i\sim N(0,\sigma_1^2)\), \(\sigma_{ij}\sim N(0,\sigma^2)\), \(cov(\alpha_i,\sigma_{ij})=0\).

For the purpose of illustration we will use I=3,J=4, then we have m=1, \(\pmb{X=j}\) (the vector for \(\mu\)). Recall from the example above

\[ \pmb{Z}_1 = \begin{pmatrix} \pmb{j}_4 & \pmb{0}_4 & \pmb{0}_4 \\ \pmb{0}_4 & \pmb{j}_4 & \pmb{0}_4\\ \pmb{0}_4 & \pmb{0}_4 & \pmb{j}_4 \\ \end{pmatrix} \] and

\[ \pmb{\Sigma}=\sigma_1^2\pmb{Z_1Z_1}'+\sigma^1\pmb{I} = \begin{pmatrix} \pmb{\Sigma}_1 & \pmb{O} & \pmb{O} \\ \pmb{O} & \pmb{\Sigma}_1 & \pmb{O}\\ \pmb{O} & \pmb{O} & \pmb{\Sigma}_1 \\ \end{pmatrix} \] where

\[ \pmb{\Sigma}_1 = \begin{pmatrix} \sigma_1^2+\sigma^2 & \sigma_1^2 & \sigma_1^2 & \sigma_1^2 \\ \sigma_1^2 & \sigma_1^2+\sigma_1^2 & \sigma_1^2 & \sigma_1^2 \\ \sigma_1^2 & \sigma_1^2 & \sigma_1^2+\sigma_1^2 & \sigma_1^2 \\ \sigma_1^2 & \sigma_1^2 & \sigma_1^2 & \sigma_1^2+\sigma_1^2 \\ \end{pmatrix} \]

Estimation of Variance Components

There are a number of different methods known for estimating the variance components in a mixed linear model. We will discuss a method known as residual maximum likelihood (REML). This is also the estimator s2 in the usual linear models.

We now add the normal assumption. So we have

\[\pmb{y}\sim N_n(\pmb{X\beta,\Sigma})\]

where \(\pmb{\Sigma}=\sum_{i=1}^m \sigma_i^2 \pmb{Z_iZ_i}'+\sigma^2\pmb{I}\). To simplify set \(\sigma_0^2=\sigma^2\) and \(\pmb{Z}_0=\pmb{I}\), so that \(\pmb{\Sigma}=\sum_{i=0}^m \sigma_i^2 \pmb{Z_iZ_i}'\).

The idea of REML is to use maximum likelihood estimation on data \(\pmb{Ky}\), where \(\pmb{K}\) is chosen so that the distribution of \(\pmb{Ky}\) involves only the variance components, not \(\pmb{\beta}\). Therefore we need a matrix with \(\pmb{KX=0}\). So we need \(E[\pmb{Ky}]=0\).

Theorem (8.3.6)

\(\pmb{K}\) must be of the form \(\pmb{K=C(I-H)=C[I-X(X'X)^{-}X']}\), where \(\pmb{C}\) is some full-rank transformation of the rows of \(\pmb{I-H}\)

proof The rows ki of \(\pmb{K}\) must satisfy the equations \(\pmb{k_i'X=0}\) or equivalently \(\pmb{X'k_i=0}\). By (4.2.14) solutions to this system of equations are given by \(\pmb{k_i=(I-X^{-}X)c}\) for all possible \(p \times 1\) vectors \(\pmb{C}\). In other words, the solutions include all possible linear combinations of the columns of \(\pmb{I-X^{-}X}\).

Now \(rank(\pmb{X^{-}X})=r\). Also \(\pmb{I-X^{-}X}\) is idempotent. Because of this idempotency, \(rank(\pmb{I-X^{-}X})=rank(\pmb{I})-rank(\pmb{X^{-}X})=n-r\). Hence by the definition of rank n-r linearly independent vectors ki that satisfy \(\pmb{X'k_i=0}\) and thus the maximal number of rows in K is n-r.

Since \(\pmb{k_i=(I-X^{-}X)c}\), \(\pmb{K=C(I-X^{-}X)}\) for some full-rank \((n-r) \times n\) matrix \(\pmb{C}\) that specifies n-r linearly independent linear combinations of the rows of the symmetric matrix \(\pmb{I-X^{-}X}\).

Theorem (8.3.7)

\[\pmb{Ky}\sim N_{n-r}(\pmb{0},\pmb{K(\sum_{i=0}^m \sigma_i^2 \pmb{Z_iZ_i}')K'})\]

proof follows from (5.2.8)

So as desired the distribution of \(\pmb{Ky}\) depends only on the variance components. Therefore

Theorem (8.3.8)

A set of m+1 estimating equations is given by

\[tr[\pmb{K'(K\Sigma K')^{-1}KZ_iZ_i'}]=\pmb{y'K'(K\Sigma K')^{-1}KZ_iZ_i'K'(K\Sigma K')^{-1}Ky}\] where \(\Sigma=\sum_{i=0}^m \sigma_i^2 \pmb{Z_iZ_i}'\)

proof since \(E[\pmb{Ky}]=0\) the log likelihood function is given by

\[ \begin{aligned} &\log L(\sigma_0^2,..,\sigma_m^2) = \\ &\frac{n-r}2\log (2\pi)-\frac12\log\vert \pmb{K\Sigma K' \vert} -\frac12 \pmb{y'(K\Sigma K')^{-1}Ky} = \\ &\frac{n-r}2\log (2\pi)-\frac12\log\vert \pmb{K\sum_{i=0}^m \sigma_i^2 \pmb{Z_iZ_i}' K' \vert} -\frac12 \pmb{y'(K\sum_{i=0}^m \sigma_i^2 \pmb{Z_iZ_i}' K')^{-1}Ky} \\ \end{aligned} \]

and so by (4.3.28) and (4.3.28)

\[ \begin{aligned} &\frac{\partial}{\partial \sigma_i^2} \log L(\sigma_0^2,..,\sigma_m^2) = \\ &-\frac12 tr\left(\pmb{(K\Sigma K')}^{-1}\left[\frac{\partial}{\partial \sigma_i^2}\pmb{(K\Sigma K')}^{-1} \right] \right) + \\ &\frac12 \pmb{y'(K\Sigma K')^{-1}} \left[\frac{\partial}{\partial \sigma_i^2}\pmb{(K\Sigma K')}^{-1} \right] \pmb{(K\Sigma K')^{-1}Ky} \\ \end{aligned} \]


In most cases the these equations have to be solved numerically.

Example (8.3.9)

One-Way Random Effects Model

Let’s analyze a one-way model where the factor is random. SO we have \(\pmb{y=\mu+\alpha_i+\epsilon_{ij}}\), \(\alpha_i\sim N(0,\sigma_1^2)\), \(\sigma_{ij}\sim N(0,\sigma^2)\), \(cov(\alpha_i,\sigma_{ij})=0\).

We saw before that

\[ \pmb{\Sigma}= \begin{pmatrix} \pmb{\Sigma}_1 & \pmb{O} & \pmb{O} \\ \pmb{O} & \pmb{\Sigma}_1 & \pmb{O}\\ \pmb{O} & \pmb{O} & \pmb{\Sigma}_1 \\ \end{pmatrix} \] where

\[ \pmb{\Sigma}_1 = \begin{pmatrix} \sigma_1^2+\sigma^2 & \sigma_1^2 & \sigma_1^2 & \sigma_1^2 \\ \sigma_1^2 & \sigma_1^2+\sigma_1^2 & \sigma_1^2 & \sigma_1^2 \\ \sigma_1^2 & \sigma_1^2 & \sigma_1^2+\sigma_1^2 & \sigma_1^2 \\ \sigma_1^2 & \sigma_1^2 & \sigma_1^2 & \sigma_1^2+\sigma_1^2 \\ \end{pmatrix} \]

Now \(\pmb{X'X}=12\), so \(\pmb{X}(\pmb{X'X})^{-}\pmb{X}'=\frac1{12}\pmb{X}\pmb{X}'=\frac1{12}\pmb{J}\) and \(\pmb{I-X}(\pmb{X'X})^{-}\pmb{X}'=\pmb{I}-\frac1{12}\pmb{J}\).

As \(\pmb{C}\) we can choose \(\pmb{C} = \begin{pmatrix} \pmb{I} & \pmb{0} \end{pmatrix}\), then

\[ \begin{aligned} &\pmb{K=C(I-H)} = \begin{pmatrix} \pmb{I} & \pmb{0} \end{pmatrix}(\pmb{I}-\frac1{12}\pmb{J}) = \pmb{I}\\ &\pmb{K\Sigma K'} = \pmb{I\Sigma_1I'}= \pmb{\Sigma} \\ \end{aligned} \] Now we need \(\pmb{\Sigma}^{-1}\), which is quite some calculation. In the end we find the equations

\[ \begin{aligned} &9\sigma_0^2 =\pmb{y'(I-\frac14 Z_1Z_1')y} \\ &2(4\sigma_1^2+\sigma_0^2) =\pmb{y'(\frac14 Z_1Z_1'-\frac1{12}J})y \end{aligned} \] which has the solution

\[ \begin{aligned} &\widehat{\sigma_0^2} =\pmb{y'(I-\frac14 Z_1Z_1')y}/9 \\ &\widehat{\sigma_1^2} =\pmb{y'(\frac14 Z_1Z_1'-\frac1{12}J})y/4 -\widehat{\sigma_0^2}/8 \end{aligned} \] It is possible that \(\widehat{\sigma_1^2}<0\). In this case one usually sets \(\widehat{\sigma_1^2}=0\).

Rather than using this approach directly there are a number of iterative method known that find the solutions.

Example (8.3.10)

Let’s return to the hearing aid data. Here List is a fixed effect (those are all the lists of interest) but Subject is a random effect, a sample from all possible people with good hearing. So this is a randomized block design with Subject the blocking variable.

library(lme4)
fit=lmer(Score~List+(1|Subject), data=hearingaid)
summary(fit)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Score ~ List + (1 | Subject)
##    Data: hearingaid
## 
## REML criterion at convergence: 646.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.1393 -0.6430 -0.0658  0.6349  2.7597 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Subject  (Intercept) 25.90    5.089   
##  Residual             36.92    6.076   
## Number of obs: 96, groups:  Subject, 24
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  34.7917     1.8402  18.907
## List         -2.5917     0.5546  -4.673
## 
## Correlation of Fixed Effects:
##      (Intr)
## List -0.754
library(car)
Anova(fit)
## Analysis of Deviance Table (Type II Wald chisquare tests)
## 
## Response: Score
##       Chisq Df Pr(>Chisq)
## List 21.834  1  2.973e-06

so we see that the REML method is used, and the routine finds \(\sigma_1^2=25.9\).

Notice that the estimates could not have been found using aov:

summary(aov(Score~List+Subject, data=hearingaid))
##             Df Sum Sq Mean Sq F value   Pr(>F)
## List         1    806   806.0  12.837 0.000543
## Subject      1     13    13.5   0.215 0.644213
## Residuals   93   5839    62.8

Inference for \(\pmb{\beta}\)

Estimates of the variance components can be plugged into \(\pmb{\Sigma}\) to obtain

\[\pmb{\hat{\Sigma}}=\sum \hat{\sigma}_i^2 \pmb{Z_1Z_i'}\]

Using this we find an estimator of \(\pmb{\beta}\) to be

\[\pmb{\hat{\beta}}=(\pmb{X'\hat{\Sigma}^{-1}X})^{-}\pmb{X'\hat{\Sigma}^{-1}y}\]

This is called the generalized least-squares (EGLS) estimator. It is a non-linear estimator because \(\hat{\Sigma}\) is a non-linear function of \(\pmb{y}\). \(\pmb{\hat{\beta}}\) is not MVUE but it can be shown to be asymptotically MVUE.

Similarly, a sensible estimator of the covariance matrix of \(\pmb{\hat{\beta}}\) is given by

\[cov(\pmb{\hat{\beta}})=(\pmb{X'\hat{\Sigma}^{-1}X})^{-}\pmb{X'\hat{\Sigma}^{-1}X(\pmb{X'\hat{\Sigma}^{-1}X})^{-}}\]

which, if \(\pmb{X}\) is full-rank, simplifies to

\[cov(\pmb{\hat{\beta}})=(\pmb{X'\hat{\Sigma}^{-1}X})^{-1}\]

Inference for Estimable Functions of \(\pmb{\beta}\)

Exact methods for inference for estimable functions of \(\pmb{\beta}\) in linear mixed models do generally not exits. However, a number of approximate methods are known.

Example (8.3.11)

Example (8.3.12)

Let’s return to the hearing aid data. Notice that the output of the lmer routine does not yield p values:

library(lme4)
fit=lmer(Score~List+(1|Subject), data=hearingaid)
summary(fit)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Score ~ List + (1 | Subject)
##    Data: hearingaid
## 
## REML criterion at convergence: 646.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.1393 -0.6430 -0.0658  0.6349  2.7597 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Subject  (Intercept) 25.90    5.089   
##  Residual             36.92    6.076   
## Number of obs: 96, groups:  Subject, 24
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  34.7917     1.8402  18.907
## List         -2.5917     0.5546  -4.673
## 
## Correlation of Fixed Effects:
##      (Intr)
## List -0.754

we can get those as follows:

library(car)
Anova(fit)
## Analysis of Deviance Table (Type II Wald chisquare tests)
## 
## Response: Score
##       Chisq Df Pr(>Chisq)
## List 21.834  1  2.973e-06