CLASSIFY procedure
Obtains a starting classification for non-hierarchical clustering (S.A. Harding).
No options
Parameters
Description
In non-hierarchical classification an initial classification is required, and it is advantageous to have these classes as homogeneous as possible. This reduces the risk of converging to a local optimum, and also encourages faster convergence of the iterative transfer algorithm used by the CLUSTER directive.
The attributes of the units to be formed into groups are specified in a set of variates; these should be placed into a pointer for use as the setting for the DATA parameter. The number of groups required is specified by the NGROUPS parameter; this must be less than the number of variates plus 2, and than the number of units plus one. The group allocations that are formed are stored in the factor indicated by the GROUPS parameter. This factor need not be declared in advance but will be formed by the procedure.
Options: none. Parameters: DATA, NGROUPS, GROUPS.
Method
The CLASSIFY procedure tries to find a suitable classification into k classes by finding the k units that are furthest apart in p-dimensional space (p being the number of variates). These are then used as nuclei for the classes, with each of the remaining units being allocated to the class with the nearest nucleus.
The units defining the nuclei are found by first finding the two units that are furthest apart. The third unit is the unit with greatest distance from the line joining the first two units. The fourth is the unit with greatest distance from the plane containing the first three units, and so on until the kth unit is the unit furthest from the (k-2) dimensional space spanned by the (k-1) units already found.
Action with
RESTRICT
The variates must not be restricted.