Statistical inference is the process of drawing some conclusions about a population based on the sample data at hand. This is what we do when we fit a linear mixed model to our data, where some population parameters (e.g., treatment means) are estimated and used to draw conclusions about our study. Under statistical inference we want to expand our findings to the complete population from where our data originated.
The three most common types of statistical inference are: 1) point estimation, 2) interval estimation, and 3) hypothesis testing. Point estimation is where we estimate an unknown parameter that is presented as a single numeric value. In contrast, under interval estimation, we represent the estimated unknown parameter using a range of numeric values that is highly likely to contain the true population value. This range is known as the confidence interval. Finally, under hypothesis testing, we assess the evidence provided by the sample data in favor, or against, a given claim about the population. This is usually done using a statistical test.
All of the above types of statistical inference are commonly used for linear mixed models (LMM), and these can be implemented on fixed or random effects, functions of these effects (e.g., predicted means), and variance components or their functions (e.g., heritability).
In this tutorial we will present an example where statistical inference is done for a variance component. In this example, the aim of our LMM is to estimate a correlation variance component (\(\rho\)), and we are interested in: 1) knowing what is the value of this correlation (point estimate), 2) reporting a 95% confidence interval (interval estimation), and 3) evaluating if this parameter is significantly different from zero (hypothesis testing).
The dataset used originates from a study that evaluated IQ scores for twins (McClave and Sincich, 2009). In this research, IQ scores of 32 pairs of identical twins were measured, where one of the twins (\(\texttt{Twin A}\)) was reared by their natural parent, and the other (\(\texttt{Twin A}\)) was reared by a relative or another person. Studies with twins are ideal because they share identical genetics, and therefore, their differences will be due exclusively to nongenetic factors, like in this case the type of rearing. For this study, the main objective is to evaluate if there are statistically significant differences between the average IQ scores between these two rearing levels.
The dataset for this example can be found at the end of this document in the file \(\texttt{TWINS.zip}\). The first few lines are:
PairID  TwinR  IQ 

112  A  113 
112  B  109 
114  A  94 
114  B  100 
126  A  99 
126  B  86 
Here, we have a factor that identifies the pair of twins \(\texttt{PairID}\), and another that specifies the rearing type \(\texttt{TwinR}\)(\(\texttt{A}\) = reared by neutral person, \(\texttt{B}\) = reared by relative or other person). The raw means (and standard deviations) for groups \(\texttt{A}\) and \(\texttt{B}\) are 93.22 (15.17) and 96.13 (13.66), respectively. These values indicate small differences in IQ between these groups. In addition, we can also plot the IQ scores for each pair of twins, as shown in the below. We note a reasonable relationship between the IQ scores from the same twin pair, with a Pearson’s correlation of 0.816.
Now, let’s fit our LMM. First, we will define our model in a generic way:
\[y_{ij} = \mu + TwinR_i + \epsilon_{ij}\]
where \(y_{ij}\) is the IQ measurement for the ijth twin; \(\texttt{TwinR}\) is the fixed effect of rearing type (\(A\) or \(B\)), and \(\epsilon_{ij}\) is the residual error, with \(\epsilon_{ij} \sim N(0, \sigma^2)\).
For analysis, the above model needs to indicate that the errors, \(\epsilon_{ij}\), are not independent from each other. In this case we need to specify that observations belonging to the same twin pair are correlated. This will be done in ASRemlR by a using the variance component structure \(\texttt{corv}\). For a single twin pair this structure has the following form:
\[\begin{bmatrix} \sigma^2 & \rho \sigma^2 \\ \rho \sigma^2 & \sigma^2 \end{bmatrix} = \sigma^2 \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix}\]
This variance structure is described by 2 variance components, one associated with the error variance, \(\sigma^2\), and the other, \(\rho\), representing the correlation between residual errors of the same twin pair. It is important to note that these parameters have natural constraints: \(\sigma^2\) can only be positive, and \(\rho\) has to be between 1 and +1.
In ASRemlR we can fit this model and produce basic output using the following code:

The above model has the fixed effect of \(\texttt{TwinR}\), but under \(\texttt{residual}\) we have defined a matrix \(\mathbf{R}\) of variancecovariance between all residuals. Here, we have an identity matrix of dimension 32 × 32 (\(\texttt{id}\)) multiplied (using the Kronoecker product, \(\otimes\)) by a 2 × 2 matrix (\(\texttt{corv}\)) defined as above. Therefore, this forms a \(\mathbf{R}\) matrix of dimension 64 × 64 corresponding to the number of observations in the dataset, as shown below.
\[\mathbf{R}=
\sigma^2
\begin{bmatrix}
1 & \rho & 0 & 0 & \cdots & 0 & 0 \\
\rho & 1 & 0 & 0 & \cdots & 0 & 0 \\
0 & 0 & 1 & \rho & \cdots & 0 & 0 \\
0 & 0 & \rho & 1 & \cdots & 0 & 0 \\
\vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\
0 & 0 & 0 & 0 & \cdots & 1 & \rho \\
0 & 0 & 0 & 0 & \cdots & \rho & 1 \\
\end{bmatrix}\]
After fitting this model, we obtain a loglikelihood value of 183.650, and the following means with their SEMs (standard error of the mean) for each of the levels of \(\texttt{TwinR}\):
TwinR  predicted.value  std.error  status 

A  93.21875  2.568317  Estimable 
B  96.12500  2.568317  Estimable 
The pvalue of 0.074 (\(F_{1,31}\) = 3.42) from ANOVA indicates that there is no significant difference (at a 5% level) between the mean IQ of the two rearing levels. And finally, we can also report the estimated variance components:
component  std.error  z.ratio  bound  %ch  

PairID:TwinR!R  1.0000000  NA  NA  F  0.0 
PairID:TwinR!TwinR!cor  0.8125831  0.0610009  13.320844  U  0.0 
PairID:TwinR!TwinR!var  211.0800432  48.7987205  4.325524  P  0.1 
In relation to our population parameter of interest, \(\rho\), we have a point estimate of 0.813 with a standard error of 0.061. These are both REML (restricted maximum likelihood) variance component estimates, and they will have an asymptotic Normal distribution. For interval estimation, we can calculate an approximated 95% confidence interval (CI) for this population parameter using: \[\hat{\rho} \pm z_{0.975} \times SE(\hat{\rho}) = 0.813 \pm 1.96 \times 0.061 = [ 0.693; 0.932 ]\]
In the above calculation we have used a zvalue (assuming a Normal distribution), and our 95% CI for \(\hat{\rho}\) is [0.693; 0.932]. This CI is symmetric and centered on the point estimate of 0.813, and it does not contain the value zero, indicating that zero might not be a likely value for this population parameter. In order to formally evaluate this, we define the following null and alternative hypotheses: \[H_0: \rho = 0\] \[H_1: \rho \ne 0\]
This is a twosided hypothesis, and we can use \(\hat{\rho}\) and its SE obtained from the table of variance components presented earlier, and perform a Wald Test, of the form: \[(\hat{\rho} – 0)/{SE}(\hat{\rho}) = 0.813 / 0.061 = 13.321\]
The above is a ztest and results in a pvalue of < 0.0001, implying that the null hypothesis should be rejected. This Wald Test is the most common way to evaluate parameters that are asymptotically normal, something that occurs with large number of observations. However, in this case, the correlation parameter, \(\rho\), is been estimated with a small number of points: only 32 pairs, resulting in a liberal and approximated CI.
Most specialized literature recommends the use of likelihoodbased statistical approaches, specifically the likelihood ratio test (LRT) to statistically compare contrasting models. In order to evaluate the significance of a parameter, the LRT compares two models: one with the variance component estimated from the data, and another where the variance component is fixed to the value specified under the null hypothesis. In our example, we will evaluate the full model (i.e., the one with the \(\rho\) component estimated), against a restricted model (i.e., the one that has \(\rho = 0\)). Our full model is the one presented earlier (\(\texttt{modT}\)), and we will need to represent our restricted model based on the following variance structure: \[\begin{bmatrix} \sigma^2 & 0 \\ 0 & \sigma^2 \end{bmatrix} = \sigma^2 \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}\]
In this structure the correlation is assumed tobe exactly zero; hence, we are assuming that all residuals are independent of one another. This model can be fitted with the following ASRemlR code that replaces \(\texttt{cov}\) by \(\texttt{idv}\):

The LRT evaluates our hypothesis by comparing the loglikelihood values and using the formula \[d = 2 \times (logL_{restr} – logL_{full})\]
where \(logL_{restr}\) and \(logL_{full}\) are the loglikelihood values for the restricted and full models, respectively. The \(d\) statistic follows an approximated \(\chi^2\) distribution with degrees of freedom equal to the difference in the number of estimated variance components between both models (in this example 1 degree of freedom).
In addition, we can also calculate some goodnessoffit statistics to compare these models. The most common ones used for LMMs are the Akaike Information Criterion (AIC), and the Bayesian Information Criterion (BIC): \[AIC = 2 \times logL + 2 \times t\] \[BIC = 2 \times logL + 2 \times t \times log(v)\] where \(t\) is the number of variance components in the model, and \(v\) is the residual degrees of freedom. Both AIC and BIC are reasonable goodnessoffit statistics for comparing nested and nonnested models. However, the LRT can only be used to statistically compare nested models.
The loglikelihood, AIC and BIC values for our two models (\(\texttt{modT}\) and \(\texttt{modT0}\)) are presented in the following table.
Model  logL  AIC  BIC 

Restricted  200.385  402.77  404.897 
Full  183.65  371.301  375.555 
Based on the above figures, we find that the value \(d\) is 33.470. For a 5% significance level, the critical value for the \(\chi^2\) distribution with 1 degree of freedom is 3.841. Hence, we find that our full and restricted models are statistically different. This leads us to reject the null hypothesis that the correlation is zero. We can also compare the models using the AIC and BIC values, where the smaller the value the better the model. In this case, both criteria point to the full model as the better model.
We can implement the LRT within ASRemlR by using the following command:

That has the option \(\texttt{boundary=FALSE}\) indicating that this is a twosided test, and the output from this command is:
df  LRstatistic  Pr(Chisq)  

modT/modT0  1  33.46966  0 
As you can see, the \(d\) value of 33.47 is associated with a very small pvalue of < 0.0001. Hence, in summary, we have strong evidence to conclude that our population \(\rho\) parameter is very likely to be nonzero.
Recall that above we calculated a 95% CI for our correlation parameter based on the reported information from the variance components table, resulting in the interval [0.693; 0.932]. However, as with our Wald Test, this CI is also based on an asymptotic Normal distribution, which is a strong assumption given the low number of observations. An added difficulty with the approximated CIs is that in some cases they can have limits outside of the natural range of the true parameter, which in our case is [1; 1].
Alternatively, we can calculate a likelihoodbased confidence interval. In order to do this, we first need to fit a model with the parameter of interest fixed to a range of values. Then, we observe the loglikelihood values for all of these models and identify, based on the LRT, those that will not be significantly different from our full model.
In our example, we fit our LMM hundreds of times where the correlation parameter is fixed to values ranging from 0.01 to 0.99 in steps of 0.001. To start the process using ASRemlR, we first fit the base model as:

In the above code, we are not actually fitting the model. Rather, by using the option \(\texttt{start.values=TRUE}\) we are requesting the setup table of the variance components. This table can be seen using:

Component  Value  Constraint 

PairID:TwinR!R  1.0000  F 
PairID:TwinR!TwinR!cor  0.1000  U 
PairID:TwinR!TwinR!var  105.5401  P 
Here, we have the same variance components reported from the \(\texttt{summary(modT_\$varcomp)}\) command. However, we also have their starting values and if they have a constraint. These constraints are \(\texttt{F}\) (fixed) for a base varaince component, \(\texttt{U}\) (unconstrained) for the correlation component and \(\texttt{P}\) (positive) for the variance components.
We can change, for example, our \(\rho\) parameter’s starting value to 0.903 and its constraint to \(\texttt{F}\) using

The modified \(\texttt{vc.table}\) is incorporated within a model fit using:

This is the same as our original full model, but we have added the lines \(\texttt{G.param=vc.table}\) and \(\texttt{R.param=vc.table}\). After we fit the above model, we obtain the following variance component estimates:
component  std.error  z.ratio  bound  %ch  

PairID:TwinR!R  1.000  NA  NA  F  0 
PairID:TwinR!TwinR!cor  0.903  NA  NA  F  0 
PairID:TwinR!TwinR!var  304.443  54.67787  5.567938  P  0 
Notice from the above table that the correlation parameter \(\rho\) has been fixed at the value of 0.903. Also, this model results in a loglikelihood of 185.550, which is not as favorable as the one reported under the unconstrained model \(\texttt{modT}\) of 183.650. (Note: We want to maximize the loglikelihood and therefore a higher value is better.)
Finally, we let’s fit the above model over the complete range of \(\rho\) values (from 0.01 to 0.99). For our Twin dataset, the plot the correlation parameter against its loglikelihood value is presented below.
In the above plot the maximum loglikelihood value is associated with a correlation estimate of 0.816 (as given in \(\texttt{modT}\)). The red line represents the threshold line, below which we have significant differences compared to the full (unconstrained) model. This threshold line is calculated by:

where \(\texttt{modT\$loglik}\) is the loglikelihood value of the full model, and \(\texttt{chi.crit}\) corresponds to the critical \(\chi^2\) value for 1 df at the 5% significance level (= 3.841). Hence, any model that has a loglikelihood value smaller than 185.571 will be significantly different from our full model.
The dotted black vertical lines correspond to the places where the curve is intersected by our red threshold line. In this case, the fixed correlation values are 0.652 and 0.903, corresponding to our limits. Therefore, we can conclude that the 95% likelihoodbased confidence interval for \(\rho\) is [0.652; 0.903]. This approach provides us with a nonsymmetric CI, that also restricts the parameter of interest to its natural range.
In summary, we have performed different types of statistical inference in the context of linear mixed models, particularly associated with their variance component estimates. The use of likelihoodbased inferences in hypothesis testing and confidence intervals is always recommended, as it provides more accurate inferences, and it has been reported that LRT has higher power than other statistical tests. One disadvantage of this inferential approach is the need to fit several, even hundreds, of models, in contrast with the use of asymptotic distribution assumptions that only requires a single model fit.
In our full model (\(\texttt{modT}\)) we have used the \(\texttt{corv}\) variance structure, but we could have used more complex and flexible variance structures, such as \(\texttt{corh}\). For our data, this structure will have the following form:
\[\begin{bmatrix}
\sigma_A^2 & \rho \sigma_A \sigma_B \\
\rho \sigma_A \sigma_B & \sigma_B^2
\end{bmatrix}\]
This variance structure has extended \(\texttt{corv}\) from a single residual variance to specific variances to each of the levels of rearing. This might be a reasonable assumption, and it could be evaluated and tested using LRT by comparing these models. We leave this to the reader to explore, but this is an extension of many modelling options of linear mixed models under ASRemlR.
Author
Salvador A. Gezan
Reference
McClave, JT; Sincich, T. 2009. Statistics, 12th Edition. Prentice Hall. Upper Saddle River, NJ.
Files to download
Notes: SAG Feb2020