PCO directive

Performs principal coordinates analysis, also principal components and canonical variates analysis (but with different weighting from that used in CVA) as special cases.


Options

PRINT = strings
Printed output required (roots, scores, loadings, residuals, centroid, distances); default * i.e. no printing

NROOTS = scalar
Number of latent roots for printed output; default * requests them all to be printed

SMALLEST = string
Whether to print the smallest roots instead of the largest (yes, no); default no


Parameters

DATA = identifiers
These can be specified either as a symmetric matrix of similarities or transformed distances or, for the canonical variates analysis, as an SSPM containing within-group sums of squares and products etc or, for principal components analysis, either as a pointer containing the variates of the data matrix or as a matrix storing the variates by columns

LRV = LRVs
Latent vectors (i.e. coordinates or scores), roots and trace from each analysis

CENTROID = diagonal matrices
Squared distances of the units from their centroid

RESIDUALS = matrices or variates
Distances of the units from the fitted space

LOADINGS = matrices
Principal component loadings, or canonical variate loadings

DISTANCES = symmetric matrices
Computed inter-unit distances calculated from the variates of a data matrix, or inter-group Mahalanobis distances calculated from a within-group SSPM


Description

The PCO directive is used for principal coordinates analysis. This method encompasses principal components analysis and a form of canonical variates analysis as special cases as explained above.

   There are six sections of output from PCO, requested using the PRINT option:

    roots
prints the latent roots and trace;

    scores
prints the principal coordinate scores;

    loadings
when the directive is being used for principal components analysis or canonical variates analysis, this specifies that the loadings from the analysis are to be printed;

    residuals
prints the residuals, this is relevant only if results are to be printed corresponding to only some of the latent roots;

    centroid
prints the distances (not squared distances) of each unit from their overall centroid;

    distances
prints the matrix of inter-unit distances (not squared distances).

The NROOTS and SMALLEST options control the printed output of roots, scores, loadings and residuals. By default, results are printed for all the roots, but you can set the NROOTS option to specify a lesser number. If option SMALLEST has the default setting no these are taken to be the largest roots, but if you set SMALLEST=yes the results are for the smallest non-zero roots. The inter-unit distances are unaffected by the setting of the NROOTS option.

   The DATA parameter supplies the data. In its simplest form, PCO works on a symmetric matrix, with values giving the associations amongst a set of objects. This could, for example, be a similarity matrix produced by FSIMILARITY.

   Alternatively, the input to PCO can be a pointer whose values are the identifiers of a set of variates, or a matrix storing the variates by columns. Now the PCO directive will construct the matrix of inter-unit squared distances, and will base the analysis on associations derived from this. This is equivalent to a principal components analysis; however, the results are derived by analysing the distance matrix rather than an SSPM. When there are more units than variates, using PCO for principal components analysis is less efficient than using the PCP directive; however, if there are more variates than units the PCO directive is more efficient. When PCO is used for principal components analysis, all the variates must be of the same length and none of their values may be missing; any restrictions on the variates are ignored.

   The third type of input to PCO is an SSPM structure. This must be a within-group SSPM: that is, you must have set the GROUP option of the SSPM directive when the SSPM was declared. Now the PCO directive will calculate the Mahalanobis distances amongst the group means, and base the analysis on them. This will give results similar to a canonical variates analysis. The representation of distances will be better than that of CVA, but CVA will be better if you are interested in loadings for discriminatory purposes.

   The second and subsequent parameters of PCO allow you to save the results. The number of units that determine the sizes of the output structures differs according to the input to PCO. For a matrix or a symmetric matrix the number of units is the number of rows of the matrix, for a pointer it is the number of values in the variates that the pointer contains, while for an SSPM the number of units is the number of groups.

   The latent roots, scores and trace can be saved in an LRV structure using the LRV parameter. If you have declared the LRV already, its number of rows must equal the number of units.

   If the input to PCO is a pointer, a matrix, or an SSPM, the principal component or canonical variate loadings can be saved in a matrix using the LOADINGS parameter. The number of rows of the matrix is equal to the number of variates (either those specified by an input pointer or those specified in the SSPM directive for an input SSPM structure), or the number of columns in an input matrix.

   The number of columns of the LRV and of the LOADINGS matrix corresponds to the number of dimensions to be saved from the analysis, and this must be the same for both of them. If the structures have been declared already, GenStat will take the larger of the numbers of columns declared for either, and declare (or redeclare) the other one to match. If neither has been declared and option SMALLEST retains the default setting no, GenStat takes the number of columns from the setting of the NROOTS option. Otherwise, GenStat saves results for the full set of dimensions. The trace saved as the third component of the LRV structure, however, will contain the sums of all the latent roots, whether or not they have all been saved.

   The distances of the units from their centroid can be saved in a diagonal matrix using the CENTROID parameter. The diagonal matrix has the same number of rows as the number of units, defined above. The RESIDUALS parameter allows you to save residuals, formed from the dimensions that have not been saved, in a matrix with one column and number of rows equal to the number of units. Finally, the inter-unit distances can be saved in a symmetric matrix using the DISTANCES parameter. The number of rows of the symmetric matrix is again the same as the number of units.

   Having obtained an ordination, you may sometimes want to add points to the ordination for additional units. If you know the squared distances of the new units from the old, the technique of Gower (1968) can be used to add points to the ordination for the new units. You can do this in GenStat by using the ADDPOINTS directive.

 

Options: PRINT, NROOTS, SMALLEST.

Parameters: DATA, LRV, CENTROID, RESIDUALS, LOADINGS, DISTANCES.


Action with RESTRICT

PCO ignores any restrictions on the DATA variates.


Reference

Gower, J.C. (1968). Adding a point to vector diagrams in multivariate analysis. Biometrika, 55, 582-585.