PCO directive
Performs principal coordinates analysis, also principal components and canonical variates analysis (but with different weighting from that used in CVA) as special cases.
Options
Parameters
Description
The PCO directive is used for principal coordinates analysis. This method encompasses principal components analysis and a form of canonical variates analysis as special cases as explained above.
There are six sections of output from PCO, requested using the PRINT option:
The NROOTS and SMALLEST options control the printed output of roots, scores, loadings and residuals. By default, results are printed for all the roots, but you can set the NROOTS option to specify a lesser number. If option SMALLEST has the default setting no these are taken to be the largest roots, but if you set SMALLEST=yes the results are for the smallest non-zero roots. The inter-unit distances are unaffected by the setting of the NROOTS option.
The DATA parameter supplies the data. In its simplest form, PCO works on a symmetric matrix, with values giving the associations amongst a set of objects. This could, for example, be a similarity matrix produced by FSIMILARITY.
Alternatively, the input to PCO can be a pointer whose values are the identifiers of a set of variates, or a matrix storing the variates by columns. Now the PCO directive will construct the matrix of inter-unit squared distances, and will base the analysis on associations derived from this. This is equivalent to a principal components analysis; however, the results are derived by analysing the distance matrix rather than an SSPM. When there are more units than variates, using PCO for principal components analysis is less efficient than using the PCP directive; however, if there are more variates than units the PCO directive is more efficient. When PCO is used for principal components analysis, all the variates must be of the same length and none of their values may be missing; any restrictions on the variates are ignored.
The third type of input to PCO is an SSPM structure. This must be a within-group SSPM: that is, you must have set the GROUP option of the SSPM directive when the SSPM was declared. Now the PCO directive will calculate the Mahalanobis distances amongst the group means, and base the analysis on them. This will give results similar to a canonical variates analysis. The representation of distances will be better than that of CVA, but CVA will be better if you are interested in loadings for discriminatory purposes.
The second and subsequent parameters of PCO allow you to save the results. The number of units that determine the sizes of the output structures differs according to the input to PCO. For a matrix or a symmetric matrix the number of units is the number of rows of the matrix, for a pointer it is the number of values in the variates that the pointer contains, while for an SSPM the number of units is the number of groups.
The latent roots, scores and trace can be saved in an LRV structure using the LRV parameter. If you have declared the LRV already, its number of rows must equal the number of units.
If the input to PCO is a pointer, a matrix, or an SSPM, the principal component or canonical variate loadings can be saved in a matrix using the LOADINGS parameter. The number of rows of the matrix is equal to the number of variates (either those specified by an input pointer or those specified in the SSPM directive for an input SSPM structure), or the number of columns in an input matrix.
The number of columns of the LRV and of the LOADINGS matrix corresponds to the number of dimensions to be saved from the analysis, and this must be the same for both of them. If the structures have been declared already, GenStat will take the larger of the numbers of columns declared for either, and declare (or redeclare) the other one to match. If neither has been declared and option SMALLEST retains the default setting no, GenStat takes the number of columns from the setting of the NROOTS option. Otherwise, GenStat saves results for the full set of dimensions. The trace saved as the third component of the LRV structure, however, will contain the sums of all the latent roots, whether or not they have all been saved.
The distances of the units from their centroid can be saved in a diagonal matrix using the CENTROID parameter. The diagonal matrix has the same number of rows as the number of units, defined above. The RESIDUALS parameter allows you to save residuals, formed from the dimensions that have not been saved, in a matrix with one column and number of rows equal to the number of units. Finally, the inter-unit distances can be saved in a symmetric matrix using the DISTANCES parameter. The number of rows of the symmetric matrix is again the same as the number of units.
Having obtained an ordination, you may sometimes want to add points to the ordination for additional units. If you know the squared distances of the new units from the old, the technique of Gower (1968) can be used to add points to the ordination for the new units. You can do this in GenStat by using the ADDPOINTS directive.
Options: PRINT, NROOTS, SMALLEST.
Parameters: DATA, LRV, CENTROID, RESIDUALS, LOADINGS, DISTANCES.
Action with
RESTRICT
PCO ignores any restrictions on the DATA variates.
Reference
Gower, J.C. (1968). Adding a point to vector diagrams in multivariate analysis. Biometrika, 55, 582-585.