KRUSKAL procedure
Carries out a Kruskal-Wallis one-way analysis of variance (S.J. Welham, N.M. Maclaren & H.R. Simpson).
Options
Parameters
Description
KRUSKAL carries out a Kruskal-Wallis one-way analysis of variance on the ranks (relative to the whole data set) of a set of samples. The samples can be stored in different variates and supplied as a list in the DATA pointer. Alternatively, they can all be placed in a single variate, and the GROUPS option set to a factor to indicate the sample to which each unit belongs. Output from the procedure is controlled by the PRINT option: test (the default setting) prints the relevant test statistics, and ranks prints the vector of ranks for each sample.
The test statistic, vector of mean ranks and degrees of freedom can be saved using the STATISTIC, MEANRANKS and DF options, respectively. Parameter RANKS can be set to a variate, or variates, to store the ranks of the data relative to the whole data set.
Options: PRINT, GROUPS, STATISTIC, MEANRANKS, DF.
Parameters: DATA, RANKS.
Method
The Kruskal-Wallis One-Way Analysis of Variance is used to test the hypothesis that several (K) samples come from distributions with the same mean. The test statistic H, is formed by ranking the combined data set, then considering the sum of these ranks within each sample:
H = [ (12 / N×(N+1)) × ∑j=1...K { Rj×Rj/nj } ] - 3×(N+1)
where Rj is the sum of ranks for the jth sample,
nj is the size of the jth sample, and
N is the size of the combined data set.
If ties are present in the data, then an adjustment to the statistic H is required:
adjusted H = H /( 1 - ∑k { tk3-tk }/(N3-N) )
where tk is the number of observations with rank k. (See for example Siegel 1956, pages 184-193.)
When there are at least five cases in each of the samples, H has approximately a Chi-square distribution on K-1 degrees of freedom. When this condition is not satisfied, and there are three samples, KRUSKAL uses a table of calculated values of the distribution of the statistic.
Action with
RESTRICT
The variates in DATA can be restricted, and in different ways. KRUSKAL uses only those units of each variate that are not excluded by their respective restrictions.
Reference
Siegel, S. (1956). Nonparametric Statistics for the Behavioural Sciences. McGraw-Hill, New York.