Predicting the performance of plant cultivars

Predicting the performance of plant cultivars

Statistics - serving plant science for generations

Predicting the performance of novel plant genotypes in different environments is essential for developing cultivars with superior economically important traits (such as yield) and resilience to environmental stresses (such as drought).

Case Study

Starr 030419-0163 Pinus radiata

The Radiata Pine Breeding Company Ltd.” runs breeding programmes to improve the productivity and quality of wood from radiata pine trees for the Australasian forestry industry. Candidate pine trees (i.e. different genotypes) are tested in a multi-environment trial (MET), a series of experiments conducted across a range of geographic locations in New Zealand and Australia over multiple years. The set of experiments within and across years is designed to provide a range of growing conditions, or “environments”. From these trials, the genetic and environmental effects on the performance of the candidate pine trees can be estimated. This information is then used by radiata pine breeders to select trees for crossing, with the aim of further genetic improvement, and by growers of radiata pine for selecting genotypes predicted to perform well at their site.

 

Plant breeders use multi-environment trial (MET) data to evaluate genotypes across a range of environments. However, the relative performance of the genotypes often varies between environments, a phenomenon known as the genotype by environment (G×E) interaction. The G×E interaction can be exploited to identify genotypes that perform well in all environments (i.e. are suitable for broad use) and those with exceptional performance in specific environments (i.e. are well suited for use in certain growing conditions).

Today, linear mixed models are widely used in the analysis of MET data. The linear mixed model framework accommodates the analysis of genetically and/or experimentally correlated data, heterogeneous variances and unbalanced data sets, enabling the accurate prediction of genotype performance within all environments in the data set. In addition, the ability to formally test statistical hypotheses provides greater insight into the nature of the G×E interaction.

An established method for analysing MET data involves a linear mixed model that adopts multiplicative factor analytic (FA) variance structures for the G×E effects. The FA model provides a parsimonious, yet flexible, method of describing the G×E interaction. For example, it allows for genetic variance heterogeneity between trials and different genetic correlation across trials. It can also be extended to include genetic relationship information (e.g. pedigrees) so that the genetic effects can be partitioned into additive and non-additive components. Furthermore, it typically has higher predictive accuracy than alternative models when there is substantial G×E interaction.

A Brief History of MET Analysis

  • Early methods used ANOVA on the two-way table of genotype by environment means. Here, the total variation is partitioned into sources due to genotype, environment and residual variation (a combination of the G×E interaction and within-trial error). Estimates of the average genotype performance across environments can be obtained.
    Limitation: Doesn’t provide information on the nature of the G×E interaction.
  • A greater emphasis on understanding the G×E interaction lead to the development of the AMMI and GGE biplots; these are descriptive tools for visualising the relationships between genotypes and environments.
    Limitation: Doesn’t provide simple numerical summaries useful for plant selection.
  • Today, linear mixed models are widely used for the analysis of MET data. In particular, a linear mixed model with a multiplicative factor analytic (FA) model for the G×E effects.  This has been found to perform extremely well in terms of parsimoniously describing the G×E interaction and predictive accuracy.

A powerful, efficient and reliable software solution for fitting linear mixed models is ASReml-R. It is particularly well suited to the analysis of MET data. With over 4500 citations, ASReml-R is a popular choice by MET data analysts due to:

  • flexible syntax that makes fitting complex linear mixed models possible and easy
  • an efficient algorithm for fitting the linear mixed model, which makes it feasible to analyse large and complex data sets.

Case Study Continued…

Biometricians from the University of Wollongong, Australia, have analysed MET data for the Radiata Pine Breeding Company Ltd. Over 360,000 progeny trees were grown in 111 trials. Pedigree information was available for the 2948 parental lines. Their aim was to produce predicted breeding values (i.e. additive genetic effects) of the parental trees. A linear mixed model adopting a fa ctor analytic (FA) model for the G×E effects was fitted using ASReml-R.

The breeding values predicted by the statistical model are now publicly available on the Radiata Pine Breeding Company Ltd. website for use by radiata pine breeders and growers.

For readers new to linear mixed models or ASReml-R, MMA (Mixed Model Academy) provides a simple and user-friendly tool to help you

  • formulate an appropriate linear mixed model for your data set
  • learn ASReml-R coding syntax
  • connect with a community of linear mixed model practitioners.