DDENDROGRAM procedure

Draws dendrograms with control over structure and style (P.G.N. Digby).


Options

STYLE = string
Style to use for the links of the dendrogram (average, centroid, lower, full); default aver

ORDERING = strings
How to define the order of the units for the dendrogram (given, ziggurat, size, first); default zigg, size, firs

REVERSE = string
Whether to reverse the order of the units in the dendrogram (no, yes); default no

ORIENTATION = string
Specifies the orientation of a dendrogram produced by high-resolution graphics (north, south, east, west); default west

METHOD = string
Method used to represent the scale on which the amalgamations have been made: settings other than the default are relevant only for data not generated by HCLUSTER or HDISPLAY (similarities, percentages, distances); default simi

SCREEN = string
Setting to use for the SCREEN option of DGRAPH (clear, keep); default clea

CHANGE = string
If a dendrogram-save structure from a previous DDENDROGRAM is used as the DATA parameter then this option specifies the area of the process where the first changes occur: see the description of the SAVE parameter (order, dendrogram, display); default orde

GRAPHICS = string
Form of graphics to be used (lineprinter, highresolution); default high

DSIMILARITY = string
Whether to display an axis for the similarities in high-resolution graphics (no, yes); default no

LOWSIMILARITY = scalar
Lower value to be used for the axis showing the similarities; default * i.e. determined from the data

ENDACTION = string
Action to be taken after completing the plot (continue, pause); default * uses the current setting


Parameters

DATA = matrices or pointers
Data defining each dendrogram in the form of either a matrix saved using the AMALGAMATIONS parameter of HCLUSTER (methods other than single linkage), or a matrix from the TREE parameter of HDISPLAY, or a SAVE structure from a previous use of DDENDROGRAM

PERMUTATION = variates
Specify or save permutations of the units for drawing each dendrogram, according to ORDERING option

LABELS = variates or texts
Supply labels to use for the units of each dendrogram; these should be in the natural order of the units, not in a permuted order

TITLE = texts
Titles for the dendrograms

WINDOW = scalars
Window to use for each dendrogram (window 1 if unset); if this is set to zero the dendrogram is not drawn, but results can still be saved using the PERMUTATION, ZIGGURAT and SAVE parameters

PENS = scalars, variates, strings or texts
Scalar or string specifying the graphics pen or symbol in which to draw each (high-resolution or line-printer) dendrogram; alternatively use of a variate or text allows the structure of each dendrogram to be highlighted by drawing different links with different graphics pens or symbols

ZIGGURAT = variates
Save the "ziggurat-degree" of the links in each dendrogram

SAVE = pointers
Save the information required to plot a dendrogram, for use as input for the DATA parameter in a subsequent call to DDENDROGRAM


Description

DDENDROGRAM draws dendrograms using line-printer or high-resolution graphics, as indicated by the GRAPHICS option.

   Dendrograms can be drawn in many ways, often with apparently quite different results, as illustrated by Digby (1985). The procedure allows the user considerable control over the way that a dendrogram is formed; in particular the order of the units and the style used for drawing the links of the dendrogram can be varied. If high-resolution graphics is to be used, a check should be made to ensure that this facility is present in the available version of GenStat. This can be done by seeing what happens when any of the relevant directives is used. Then directives DEVICE, FRAME and PEN should be used to change the default settings, if required; these can be ascertained using the statement

HELP ENVIRONMENT, PICTURES, CURRENT

   The input for the procedure is given by the DATA parameter. This should be a matrix containing the amalgamations information from hierarchical cluster analysis (from the AMALGAMATIONS parameter of HCLUSTER) or a matrix containing the minimum spanning tree information (from the TREE parameter of the HDISPLAY directive); alternatively a SAVE structure from a previous DDENDROGRAM can be used as input. However, in the current release of GenStat, the amalgamations matrix from HCLUSTER is unusable if the clustering has been been produced by single linkage, so the minimum spanning tree information, which is equivalent, should be used as input.

   The PERMUTATION parameter can be supplied with a variate, either to specify a permutation of the rows of the dendrogram or to save the permutation generated by DDENDROGRAM, as indicated by the ORDERING option. Setting ORDERING=given takes the ordering defined by the PERMUTATION variate. The other settings of ORDERING define partial orderings of the units, and are used in conjunction with each other to obtain the full ordering: ziggurat (Critchley 1983) is associated with ultrametric distances amongst the units; size specifies that when 2 groups merge the smaller is always placed before the larger in the order; first specifies that when 2 groups merge the group containing the lowest numbered unit is always placed before the other in the order. The orders given by settings ziggurat and size are not completely specified and recourse may be made to the other of these settings or to first. If ORDERING is not set to given then a list of settings may be specified in which case the first in the list is used, the second is used to satisfy indeterminacies in the order given by the first setting in the list, and so on. The default is the list of settings: ziggurat, size, first.

   Option REVERSE allows the ordering thus obtained to be reversed.

   The LABELS parameter can be given a variate or a text to supply labels for the rows of the dendrogram. Labelling can be suppressed altogether by using a text containing only spaces.

   The STYLE option controls the style to use in forming the links of the dendrogram: its setting indicates where the line representing each new cluster should be placed. Assuming that the dendrogram has the units on the left-hand side, the settings can be described as follows: average (the default) the new line is midway between the old lines; centroid the new line is placed at the mid-point of all the units in the group it represents; lower the new line is a continuation of the lower of the two old lines (comparable with dendrograms from HCLUSTER); full the new line is a continuation of the upper or lower of the two old lines, so that each vertical line spans all the units in the group it represents.

   The ORIENTATION option is relevant to high-resolution graphics, when it controls the orientation of the dendrogram: for example the setting north results in a "hanging dendrogram" with the units across the top. The default setting is west, which gives a dendrogram with the units on the left-hand side; this is also how DDENDROGRAM draws dendrograms on the line-printer.

   The METHOD option indicates the scale on which the amalgamations have been made. This option need be set only if the data have been obtained from a source other than HCLUSTER or HDISPLAY.

   The TITLE parameter specifies a title for each dendrogram. For high-resolution graphics, the WINDOW parameter defines the graphics window to use for each plot. With line-printer graphics, two "windows" are available: window 1 has a width of 101 characters, window 2 a width of 61 characters. If WINDOW is not set, window 1 is used. If it is set to zero, the dendrogram is not drawn but results can still be saved using the PERMUTATION, ZIGGURAT and SAVE parameters; however, if the SAVE structure is used later as input to DDENDROGRAM, the CHANGE option must not be set to display as the dendrogram stage will not have been completed.

   The LOWSIMILARITY option allows the lower value of the axis showing the similarities (or percentage similarities or distances, according to the setting of the METHOD option) to be set e.g. to zero. Otherwise, this is determined automatically from the minimum value in the data. By default the axis is not plotted, but this can be changed by setting option DSIMILARITY=yes. As in other graphics commands, the SCREEN option controls whether to clear the high-resolution graphics screen before plotting (default clear), and the ENDACTION option controls whether GenStat pauses or continues after completing the plot.

   For high-resolution graphics, the PENS parameter can be supplied with a scalar indicating the graphics pen with which to draw the dendrogram. Alternatively, if required, a variate can be specified to highlight the structure of the dendrogram by drawing different links with different pens; the links are taken in the same order as the rows of the AMALGAMATIONS matrix from HCLUSTER or in increasing order of the links of the minimum spanning tree. DDENDROGRAM will use pen 1 if the PENS parameter is not set. Any pens used by DDENDROGRAM will be set to METHOD=line, SYMBOLS=0, JOIN=given. If a scalar is supplied or PENS is not set, the pen used will also have LINESTYLE set to 1. If a variate is used, appropriate settings of COLOUR and LINESTYLE should set (using the PEN directive) prior to calling DDENDROGRAM. Similarly, with line-printer graphics, the PENS parameter can be set either to a string or to a text, according to whether the links are to be drawn with the same or different symbols; if the parameter is unset, the plus symbol (+) is used for all the links.

   The ZIGGURAT parameter can be used to save the "ziggurat-degree" (Critchley 1983) of each link. This could then be used to form the setting of the PENS parameter for a later dendrogram, in order to display particular aspects of the clustering more clearly.

   The SAVE parameter can be used to save the various structures that control the drawing of a dendrogram in order to save computing time when drawing a similar dendrogram. The SAVE structure should then be used as the setting of the DATA parameter, and the CHANGE option used to indicate the stage at which to start changing aspects of the previous dendrogram. The various stages (in order) involve the following options and parameters:

order
ORDERING and PERMUTATION;

dendrogram
STYLE and METHOD;

display
REVERSE, ORIENTATION, SCREEN, LABELS, TITLE, WINDOW, PENS, DSIMILARITY and LOWSIMILARITY.

 

Options: STYLE, ORDERING, REVERSE, ORIENTATION, METHOD, SCREEN, CHANGE, GRAPHICS, DSIMILARITY, LOWSIMILARITY, ENDACTION.

Parameters: DATA, PERMUTATION, LABELS, TITLE, WINDOW, PENS, ZIGGURAT, SAVE.


Method

Dendrograms are constructed and drawn in four separate stages: firstly the amalgamations information is used to construct information on group sizes; secondly a permutation of the units is formed, if required, according to several possible ordering schemes; thirdly graphical information on each of the links of the dendrogram is formed; lastly this graphical information is used to display the dendrogram, subject to requirements over orientation, pens, etc. Separate procedures are used for each stage (for details see the source code of DDENDROGRAM, obtainable via LIBEXAMPLE). A preliminary stage is also needed to construct the amalgamations from information on a minimum spanning tree. Communication amongst the subsidiary procedures is obtained using a pointer, which the user may keep using the SAVE parameter. The algorithms used by the first three subsidiary procedures are similar to those described by Digby (1984a, 1984b).


Action with RESTRICT

If any of the options or parameters are restricted unpredictable results may occur: none of the options or parameters should be restricted.


References

Critchley, F. (1983). Ziggurats and dendrograms. Report No. 43. Department of Statistics, University of Warwick.

Digby, P.G.N. (1984a). Drawing pretty dendrograms. Genstat Newsletter, 14, 18-26.

Digby, P.G.N. (1984b). Dendrograms and ziggurats. Genstat Newsletter, 14, 14-18.

Digby, P.G.N. (1985). Graphical displays for classification. PACT Journal of the European Study Group on Physical, Chemical and Mathematical Techniques Applied to Archaeology.