HDISPLAY directive

Displays results ancillary to hierarchical cluster analyses: matrix of mean similarities between and within groups, a set of nearest neighbours for each unit, a minimum spanning tree, and the most typical elements from each group.


Option

PRINT = strings
Printed output required (neighbours, tree, typicalelements, gsimilarities); default tree


Parameters

SIMILARITY = symmetric matrices
Input similarity matrix for each cluster analysis

NNEIGHBOURS = scalars
Number of nearest neighbours to be printed

NEIGHBOURS = matrices
Matrix to store nearest neighbours of each unit

GROUPS = factors
Indicates the groupings of the units (for calculating typical elements and mean similarities between groups)

TREE = matrices
To store the minimum spanning tree (as a series of links and corresponding lengths)

GSIMILARITY = symmetric matrices
To store similarities between groups


Description

You can use the HDISPLAY directive to print ancillary information useful for interpreting cluster analyses, and to save information to use elsewhere in GenStat, for example for plotting.

   The SIMILARITIES parameter specifies a list of symmetric similarity matrices. These are operated on, in turn, to produce the output requested by the PRINT option and to save the information specified by other parameters. Since the interpretations of the remaining parameters are closely linked to the different settings of the PRINT option, each setting is discussed below with the relevant parameters.

   The NNEIGHBOURS parameter gives a list of scalars indicating how many neighbours will appear in the printed table of nearest neighbours.

   The NEIGHBOURS parameter can specify a list of identifiers to store details of nearest neighbours. These will be declared implicitly, if necessary, as matrices. The rows of the matrices correspond to the units; there should be an even number of columns. The values in the odd-numbered columns represent the neighbouring units in order of their similarity, while the values in the even-numbered columns are the corresponding similarities. If you have declared the matrix previously and it does not have enough columns, then NEIGHBOURS stores as many neighbours as possible. If there is an odd number of columns in the matrix, the last column is not filled. If the matrix is declared implicitly, the number of columns will be twice the value of the NNEIGHBOURS scalar.

   If the PRINT option includes the setting neighbours, GenStat prints a table of nearest neighbours for every sample, together with their values of similarity. The number of neighbours printed is determined by the value of the NNEIGHBOURS scalar; if NNEIGHBOURS is not set, the table is not printed. This information is also useful for interpreting clusters and ordinations.

   The GROUPS parameter specifies a factor to divide the units of each similarity matrix into clusters. You may have formed the factor from a previous hierarchical cluster analysis, using HCLUSTER. This parameter must be set if the PRINT option includes the settings typicalelement or gsimilarities.

   If the PRINT option includes the setting typicalelement, GenStat prints the average similarity of each group member with the other group members. This is to help you identify typical members of each group: typical members will have relatively large average similarities compared to those of the other members. Within each group, members are printed in decreasing order of average similarity.

   The GSIMILARITY parameter specifies a list of symmetric matrices in which you can save the mean between-group and within-group similarities. Any structure that you have not declared already will be declared implicitly to be a symmetric matrix with number of rows equal to the number of levels of the factor in the GROUPS parameter.

   If the PRINT option includes the setting gsimilarities, GenStat prints the mean similarities between-groups and within-groups. Self-similarities are excluded.

   The TREE parameter can specify a matrix to save the minimum spanning tree. The matrix is set up with two columns and number of rows equal to the number of units. For each unit, the value in the first column is the unit to which that unit is linked on its left; the second column is the corresponding similarity. The first unit is not linked to any unit on its left, as it is always the first unit on the tree; so the first row of the matrix contains missing values.

   Setting the PRINT option to tree prints the minimum spanning tree associated with the similarity matrix specified the SIMILARITY parameter. The minimum spanning tree (MST) is not a GenStat structure, but it can be kept in the form described above: that is, in a matrix with two columns. An MST is a tree connecting the n points of a multidimensional representation of the sampling units. In a tree every unit is linked to a connected network and there are no closed loops; the special feature of the MST is that, of all trees with a sampling unit at every node, it is the one whose links have minimum total length. The links include all those that join nearest neighbours; the MST is closely related to single linkage hierarchical trees. Minimum spanning trees are also useful if you superimpose them on ordinations to reveal regions in which distance is badly distorted (see procedure DMST); if neighbouring points, as given by the MST, are distant in the ordination then something is badly wrong.


Option: PRINT.

Parameters: SIMILARITY, NNEIGHBOURS, NEIGHBOURS, GROUPS, TREE, GSIMILARITY.