| Microarray Cluster Probes/Genes |
| See Also Example |
A dendrogram for the hierarchical cluster analyses may be plotted, but for a large number of probes this is less useful as individual probes cannot be read. The responses of each probe across the targets/slides can also be plotted in a shade plot, but for large numbers of probes this is slow, in which case the mean response for each group can be plotted. A spreadsheet containing the grouped data can also be generated using the Store button.
With large numbers of probes, the limit of RAM can be quickly reached, so an option to only cluster probes with the largest mean absolute response is available.
| Hierarchical | - Hierarchical clustering using the method selected within the Link Method option. |
| K-Means | - Non-hierarchical clustering using k-means method |
| Single Link | defines the similarity between two clusters as the maximum similarity between any two samples in those clusters |
| Nearest Neighbour | synonym for Single link |
| Complete Link | defines the similarity between two clusters as the minimum similarity between any two samples in those clusters |
| Furthest Neighbour | synonym for Complete Link |
| Average Link | defines the similarity between a cluster and two merging clusters as the average of the similarities with each of the original clusters. It therefore replaces two merging clusters by their mean, unweighted by cluster size |
| Group Average | an average is taken over all the samples in the two merging clusters. Thus, the original clusters are replaced by their mean, weighted by cluster size |
| Median Sorting | can be thought of in terms of clusters being represented by points in a multidimensional space; when two clusters join, the new cluster is represented by the midpoint of the original cluster points |
| Type | Contribution |
| Euclidean | 1 - {(xi - xj) / range}**2 |
| Cityblock | 1 - |xi - xj| / range 1 |
| Within-class dispersion | Minimizes the determinant of the pooled within-class dispersion matrix (W). Under the assumption that the data originated from a mixture of k multivariate Normal distributions, with equal variance-covariance matrix V, the MLE of V is obtained when the grouping into k classes minimizes det(W). Obtains compact groups. |
| Mahalanobis squared distance | Maximizes the total between-groups Mahalanobis squared distance. This will obtain separation of groups, possibly at the cost of compactness. Equivalent to the Within-class dispersion criterion when there are only two groups. |
| Between-group sum of squares | Minimizes the trace of the pooled within-class dispersion matrix (W). Equivalent to maximizing the total between-group sum of squares, or Euclidean distance between groups. |
| Maximal predictive classification | Maximal predictive classification is suitable for binary data. Each group has a class predictor, a binary indicator for each variate set to 0 or 1 according to whichever value is more frequent in the group. The criterion to be maximized is the total number of agreements between units and their respective class predictors. |
| Run | Run the analysis. |
| Cancel | Close the menu without further changes. |
| Options | Opens a dialog where additional options and settings can be specified for the analysis. |
| Defaults | Set the menu settings back to the default settings. Clicking the right mouse on this button produces a pop-up menu where you can choose to set the menu using the currently stored defaults or the GenStat default settings. |
| Store | Opens a dialog to specify names of structures to store the results from the analysis. The names to save the structures should be supplied before running the analysis. |
The options used were:
and the Store button as used to save Group results back to a spreadsheet:
The resulting response for each group is show below (by individual EST):
The spreadsheet generated by the Display in Spreadsheet option is shown: