Overlaying Design Matrices for a Parental Model in ASRemlR
Most breeding programs plan several controlled crosses between outstanding parents to detect favorable alleles in their offspring. The progeny are later evaluated in a field experiment, and this information is used to assess the genetic worth of the parents by fitting parental linear mixed models (LMMs) and obtaining best linear unbiased predictions (BLUPs). These BLUPs, which are the general combining ability (GCA), or 1/2 of the breeding value (BV, with BV = 2 \(\times\) GCA) of each parent, are then used to select the best parents for future crosses or operational deployment.
In many plant breeding programs, a parent is considered in several crosses. In most cases it is easy to assign the sex of a given individual. However, several commercial plant species are monoecious, which means that a given genotype will bear both male and female flowers. In contrast, dioecious species have distinctive male and female plants. Some examples of monoecious species are corn, squash, banana, and many conifers, particularly those of the genus Pinus.
In quantitative genetic analyses, monoecious species present a particular challenge, as a given parent can contribute to the estimation of its breeding value (or GCA) as both a male and a female, something that needs to be taken into consideration when a statistical model is fitted.
This is done by overlaying design matrices of the factors associated with male and female parents. We will illustrate this here using an example of loblolly pine (Pinus taeda) in which some individuals, depending on the availability of pollen or flowers, were used in several artificial crosses as both male and female in the breeding program.
The data used here originates from a loblolly pine clonal study published by Resende et al. (2012). Parents were crossed in a circular mating design, constituting several fullsib families. Individuals from these families were vegetatively propagated (cloned) and established in a series of field trials. A subset of the full dataset, corresponding to diameter at breast height (DBH, inches) measured at 6 years since planting at the Nassau (Florida, USA) site, is used here. We will be using the adjusted mean values for this trait. In this dataset we have a total of 71 families that originated from 43 parents; however, 20 of those parents were used as both males and females in different crosses. There were no reciprocal or selfpollinated crosses planned, but these can occur in other crops and they might need to be identified and modelled properly. In this dataset, each family is formed by between 1 and 16 individuals (with an average of 12.2). We also have pedigree information for the 43 parents.
The first few lines of the phenotypic dataset \(\texttt{DAT_NASSAU.TXT}\) are presented below:
tree  mother  father  family  DBH 

280200  20012  44112  1017  9.410 
282200  20022  44090  1020  11.953 
284400  44126  142170  1049  10.785 
285210  50152  58040  1056  9.950 
285600  142024  44112  1006  10.325 
286600  44012  17766  1036  11.122 
And the first and last few lines of parental pedigree file \(\texttt{PEDPAR_NASSAU.TXT}\) are:
genotype  mother  father  

1  14006  0  0 
2  14046  0  0 
3  14060  0  0 
4  14070  0  0 
5  14104  0  0 
6  14106  0  0 
48  142076  14114  14104 
49  142170  14106  14046 
50  142194  14216  14060 
51  202060  20030  20052 
52  202096  20034  20052 
53  222056  22034  14006 
54  222060  22020  22034 
In the phenotypic data each row contains the id of the individual tree, together with information about their mother and father, followed by the adjusted phenotypic response DBH. In the pedigree file there is information about the mother and father of each of the parents used in the crosses. For example, note that tree \(\texttt{284400}\) has father \(\texttt{142170}\), and this genotype has as parents genotypes \(\texttt{14106}\) and \(\texttt{14046}\).
We are interested in fitting a parental model, where our objective is to estimate GCA values for each of the parents. (Note that this is different than an animal model, where we estimate a BV for each of the individuals (parents or offspring).) Our parental base model is:
\[y_{ijk} = \mu + f_i + m_j + fm_{ij} + \epsilon_{ijk}\]
where, \(y_{ijk}\) is the response for the kth individual originating from the cross between the ith female with the jth male; \(\mu\) is the overall mean; \(f_i\) is the random effect of the ith female, with \(f_i \sim N(0, \sigma^2_f)\); \(m_j\) is the random effect of the jth male, with \(m_j \sim N(0, \sigma^2_m)\); and \(fm_{ij}\) is the interaction or family effect of the cross between the ith female with the jth male, with \(fm_{ij} \sim N(0, \sigma^2_{fm})\); and \(\epsilon_{ijk}\) is the random residual with \(\epsilon_{ijk} \sim N(0, \sigma^2)\).
Note that for the parental effects \(f_i\) and \(m_j\) there is a pedigree available, which is used to obtain the numerator relationship matrix \(\mathbf{A}\), that is incorporated in our analyses. Hence, the vectors of parental effects have the assumptions \(\mathbf{f} \sim N(\mathbf{0},\sigma^2_f \mathbf{A})\) and \(\mathbf{m} \sim N(\mathbf{0},\sigma^2_m \mathbf{A})\).
In the above model, we are assuming that there are distinct variances for females and males, \(\sigma^2_f\) and \(\sigma^2_m\), and that these can take any positive value.
We can fit the above model with ASRemlR using the following code:

In this code we are generating the numerator relationship matrix \(\mathbf{A}\) using the command \(\texttt{ainverse()}\). The matrix \(\mathbf{A}\) is incorporated into the model using the function \(\texttt{vm()}\). The loglikelihood of the above model fitted to our data is \(\texttt{744.488}\) and the estimated variance components are:
component  std.error  z.ratio  bound  %ch  

vm(mother, ainv)  0.2774623  0.1121623  2.473759  P  0 
vm(father, ainv)  0.3815927  0.1348447  2.829868  P  0 
family  0.0000004  NA  NA  B  0 
units!units  1.8702273  0.0930707  20.094696  P  0 
units!R  1.0000000  NA  NA  F  0 
Note the difference in variance components between males (fathers, \(\texttt{0.277}\)) and females (mothers, \(\texttt{0.382}\)). Finding a difference in variance components by sex is common for plant species and it might be correct depending on the origin of the parents, but also other biological reasons might explain these differences. Also note that the family variance is bounded at almost zero.
In our case, we can calculate both the narrowsense heritability for females and males (\(h^2_f\) and \(h^2_m\)) and the dominance ratio (\(d^2\)), by using the following expressions:
\(h^2_f = 4 \times \sigma^2_f / (\sigma^2_f + \sigma^2_m + \sigma^2_{fm} + \sigma^2)\)
\(h^2_m = 4 \times \sigma^2_m / (\sigma^2_f + \sigma^2_m + \sigma^2_{fm} + \sigma^2)\)
\(d^2 = 4 \times \sigma^2_{fm} / (\sigma^2_f + \sigma^2_m + \sigma^2_{fm} + \sigma^2)\)
In some cases, particualrly when the estimations of heritability for both sexes might not be very different, these can be combined into a single estimate, as:
\(h^2 = 2 \times (\sigma^2_f +\sigma^2_m) / (\sigma^2_f + \sigma^2_m + \sigma^2_{fm} + \sigma^2)\)
All of the above expressions were obtained in ASRemlR using the command \(\texttt{vpredict()}\) as shown below:

with the results:
Estimate  SE  

h2m  0.4388000  0.1616208 
h2f  0.6034797  0.1858860 
d2  0.0000007  0.0000001 
h2  0.5211399  0.1043225 
As expected, our dominance ratio \(d^2\) is very small in this particular trait; hence, it could be ignored. Also, the narrowsense heritabilities are reasonable for this type of study, with a combined estimate of \(\texttt{0.521}\) but with a moderate approximated standard error (\(\texttt{0.104}\)).
Also, we can obtain the BLUPs for the model terms \(\texttt{mother}\) and \(\texttt{father}\), which in this case will correspond to the GCA values. If these are multiplied by two, we will have the BVs. The GCAs are obtained with the code:

In the output below, we can observe a few of the estimated GCA values for the \(\texttt{mother}\) (top) and \(\texttt{father}\) (bottom).
solution  std.error  z.ratio  

vm(mother, ainv)_14006  0.3600002  0.2841548  1.2669160 
vm(mother, ainv)_14046  0.0000000  0.5267472  0.0000000 
vm(mother, ainv)_14060  0.2286267  0.4864881  0.4699533 
vm(mother, ainv)_14070  0.0000000  0.5267472  0.0000000 
vm(mother, ainv)_14104  0.4620406  0.4208766  1.0978056 
vm(mother, ainv)_14106  0.0000000  0.5267472  0.0000000 
solution  std.error  z.ratio  

vm(father, ainv)_14006  0.0013612  0.2827143  0.0048148 
vm(father, ainv)_14046  0.5495441  0.5510751  0.9972219 
vm(father, ainv)_14060  0.0000000  0.6177321  0.0000000 
vm(father, ainv)_14070  0.3792472  0.3257064  1.1643837 
vm(father, ainv)_14104  0.2087355  0.5024186  0.4154613 
vm(father, ainv)_14106  0.5495441  0.5510751  0.9972219 
vm(father, ainv)_14114  0.1476378  0.3462045  0.4264466 
Note that for parent \(\texttt{14006}\) the estimates are different as a mother and a father (\(\texttt{0.360}\) against \(\texttt{0.001}\)), and their precision will depend on the amount of information a parent has (i.e., number of offspring), but also in its relationship with other genotypes on the dataset.
So far, we have not combined the information from a given parent when it is used as a male and as a female. Now, it is of interest to overlay the factors \(\texttt{mother}\) and \(\texttt{father}\). To do this in ASRemlR we require two elements: the use of the statement \(\texttt{and()}\) in the model and the specification of the command \(\texttt{equate.levels}\). This is illustrated in the following code for the same data as above:

First, letâ€™s focus on the \(\texttt{and()}\) function. This is used to request that the design matrices associated with \(\texttt{father}\) are overlaid on the design matrix associated with the immediately previous term (here, \(\texttt{mother}\)). Second, the expression \(\texttt{equate.levels}\) requests that the levels for the factor \(\texttt{mother}\) and \(\texttt{father}\) are treated as identical. That is, levels that have the same name in the factors \(\texttt{mother}\) and \(\texttt{father}\) will be treated as the same, and levels that are found in only one of the factors will be added to the new design matrix. Both of the above commands are required in ASRemlR to ensure that consistent BV (or GCA) estimates are obtained.
In order to illustrate the above overlay of matrices, we will use a simple example, in which we have eight individuals originating from crosses of four parents (P1, P2, P3 and P4). These parents were used as female or male, and in some cases we have reciprocal crosses and selfing. The crossing and design matrices for our base model (without overlay) are shown below:
\[
\begin{matrix}
indiv & female & male \\
A & P1 & P2 \\
B & P1 & P3 \\
C & P1 & P4 \\
D & P2 & P3 \\
E & P2 & P4 \\
F & P3 & P2 \\
G & P3 & P4 \\
H & P4 & P4 \\
\end{matrix}
\Rightarrow
\begin{matrix}
P1 \:\: P2 \:\: P3 \:\: P4 \\
\begin{bmatrix}
1 & 0 & 0 & 0 \\
1 & 0 & 0 & 0 \\
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 0 & 1 & 0 \\
0 & 0 & 0 & 1 \\
\end{bmatrix}
\end{matrix}
\:\:
\begin{matrix}
P2 \:\: P3 \:\: P4 \\
\begin{bmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1 \\
0 & 1 & 0 \\
0 & 0 & 1 \\
1 & 0 & 0 \\
0 & 0 & 1 \\
0 & 0 & 1 \\
\end{bmatrix}
\end{matrix}
\]
Once the LMM model is fitted based on the above matrices (as we did with our previous object \(\texttt{modP}\)), we will have a total of four GCAs for females and three for males. If we count the number of ones for each effect, we find that they have between one and three pieces of information to be used in their estimation.
Now, if we proceed to overlay the female and male incidence matrices, we obtain the following results:
\[
\begin{matrix}
indiv & female & male \\
A & P1 & P2 \\
B & P1 & P3 \\
C & P1 & P4 \\
D & P2 & P3 \\
E & P2 & P4 \\
F & P3 & P2 \\
G & P3 & P4 \\
H & P4 & P4 \\
\end{matrix}
\Rightarrow
\begin{matrix}
P1 \:\: P2 \:\: P3 \:\: P4 \\
\begin{bmatrix}
1 & 1 & 0 & 0 \\
1 & 0 & 1 & 0 \\
1 & 0 & 0 & 1 \\
0 & 1 & 1 & 0 \\
0 & 1 & 0 & 1 \\
0 & 1 & 1 & 0 \\
0 & 0 & 1 & 1 \\
0 & 0 & 0 & 2 \\
\end{bmatrix}
\end{matrix}
\]
You can observe that there is a single design matrix but only with four columns: one for each parent. In this case, we have more pieces of information to estimate each effect, ranging from three to six. Of special interest is the effect of parent P3, it now uses four pieces of information to estimate its single GCA, whereas before it was being estimated as two effects, each with two pieces of information.
There are other interesting things in the above matrix, such as the value 2 associated with individual H, that originated from the cross P4 \(\times\) P4, which corresponds to a selfing of parent P4. Also, individuals D and F are fullsibs, originating from reciprocal crosses, but in this case there is no distinction between their parental effects. Both, selfing and reciprocal crosses can be considered in the fitted linear mixed model as extensions with the aim of evaluating their importance or significance.
Finally, we can observe our results from fitting the above overlay model to our loblolly pine data. The estimated variance components are:
component  std.error  z.ratio  bound  %ch  

vm(mother, ainv)  0.4000453  0.1162417  3.441494  P  0 
units!units  1.8599439  0.0917484  20.272218  P  0 
units!R  1.0000000  NA  NA  F  0 
In this particular case, we are missing the variance associated with the factor \(\texttt{father}\). However, this is still part of the model. As the matrices were overlaid, we have as an output a single parental effect now called \(\texttt{mother}\), which represents both male and female parents. Also, note that we dropped the term associated with \(\texttt{family}\), as in the previous analysis this was bound to zero. The loglikelihood of this model is \(\texttt{737.328}\), a greater value indicating a better fit than the obtained from the original model (\(\texttt{744.488}\)), and probably a significant improvement.
We can now proceed to obtain our estimate of heritability using the expression:

which is the traditional formula; however, note that in the denominator we are considering twice the \(\texttt{mother}\) variance as we have a single parental variance component. The results from the above code are presented below:
Estimate  SE  

h2  0.601564  0.1248922 
As expected, these results are different to those from our previously fitted model because of the way this model is combining the available information. There is an increase in the heritability when we overlay the factors of mother and father compared to when we do not (\(h^2 = 0.521\)). This increase is expected as we now have more information to estimate each of the effects. Also, we expect a reduction in the standard errors of the estimated GCA values, which are presented below:
solution  std.error  z.ratio  

vm(mother, ainv)_14006  0.5174180  0.2116542  2.4446388 
vm(mother, ainv)_14046  0.5055163  0.5609615  0.9011606 
vm(mother, ainv)_14060  0.2843599  0.5668751  0.5016271 
vm(mother, ainv)_14070  0.3430721  0.3366609  1.0190434 
vm(mother, ainv)_14104  0.4849910  0.4781210  1.0143688 
vm(mother, ainv)_14106  0.5055163  0.5609615  0.9011606 
In this case we have only \(\texttt{mother}\) effects that identify each of the levels of the parent. In contrast with our previous model, parent \(\texttt{14006}\) has a single GCA value (\(\texttt{0.517}\)), and its standard error is smaller than before (\(\texttt{0.21}\) compared to \(\texttt{0.28}\)). Some GCA estimates will increase or decrease, and their precision might get better or worse depending on how information is combined.
In this example, we have explored a parental model where the effects of male and female were overlaid. This resulted, according to the loglikelihood value, in a statistically superior model that combines the data better. But more importantly, this model is more biologically accurate given our knowledge of monoecy of the species under study.
References
Resende, MFR, Munoz, P, Resende, MDV, Garrick, DJ, Fernando, RL, Davis, JM, Jokela, EJ, Martin, TA, Peter, GF and M. Kirst. (2012). Accuracy of genomic selection methods in a standard data set of loblolly pine (Pinus taeda L.) Genetics 190:15031510
File to download
Notes: SAG August2020