Non-hierarchical Cluster Analysis
See Also
Non-hierarchical cluster analysis forms a grouping of a set of units, into a pre-determined number of groups, using an iterative algorithm that optimizes a chosen criterion. Starting from an initial classification, units are transferred from one group to another or swapped with units from other groups, until no further improvement can be made to the criterion value. The method is explained in more detail in the The Guide to GenStat® Release 8 - Part 2: Statistics. There is no guarantee that the solution thus obtained will be globally optimal - by starting from a different initial classification it is sometimes possible to obtain a better classification. However, starting from a good initial classification (see Options) much increases the chances of producing an optimal or near-optimal solution.

Data values

Specifies the set of variates making up the data matrix. The names of the variates can be selected from Available Data. The button allows multiple selections to be copied.

Criterion

The criterion to be optimized by the clustering. This can be set to one of the following four choices:
Within-class dispersion Minimizes the determinant of the pooled within-class dispersion matrix (W). Under the assumption that the data originated from a mixture of k multivariate Normal distributions, with equal variance-covariance matrix V, the MLE of V is obtained when the grouping into k classes minimizes det(W). Obtains compact groups.
Mahalanobis squared distance Maximizes the total between-groups Mahalanobis squared distance. This will obtain separation of groups, possibly at the cost of compactness. Equivalent to the Within-class dispersion criterion when there are only two groups.
Between-group sum of squares Minimizes the trace of the pooled within-class dispersion matrix (W). Equivalent to maximizing the total between-group sum of squares, or Euclidean distance between groups.
Maximal predictive classification Maximal predictive classification is suitable for binary data. Each group has a class predictor, a binary indicator for each variate set to 0 or 1 according to whichever value is more frequent in the group. The criterion to be maximized is the total number of agreements between units and their respective class predictors.

Number of groups

Sets the number of groups to be formed.

Options

You can control various aspects of the algorithm used for the clustering from the Options menu, specify the initial classification and also select which results are to be printed.

See Also