KOLMOG2 procedure

Performs a Kolmogorov-Smirnoff two-sample test (S.J. Welham, N.M. Maclaren & H.R. Simpson).


Options

PRINT = strings
Output required (test, differences, ranks): test gives the test statistic, differences gives signed differences, and ranks produces the ranks for each sample; default test

GROUPS = factor
Defines the groups for a two-sample test if only the Y1 parameter is specified


Parameters

Y1 = variates
Identifier of the variate holding the first sample

Y2 = variates
Identifier of the variate holding the second sample

R1 = variates
Saves the ranks of the first sample

R2 = variates
Saves the ranks of the second sample

STATISTIC = scalars
Scalar to save the test statistic (the maximum absolute difference between the cumulative distribution functions)

CHISQUARE = scalars
Scalar to save the chi-square approximation to the test statistic

DIFFERENCES = variates
Variate to save the signed differences between the cumulative distribution functions


Description

The Kolmogorov-Smirnoff test assesses the similarity between the underlying distributions of the two samples, by comparing their cumulative distribution functions; the test statistic is the maximum absolute difference between the cumulative distribution functions. The samples can either be specified in two separate variates using the parameters Y1 and Y2. Alternatively, they can be given in a single variate, with the GROUPS option set to a factor to identify the samples. The GROUPS option is ignored when the Y2 parameter is set.

   Output from the procedure is controlled by the PRINT option: test prints the relevant test statistic, differences prints the signed differences, and ranks prints a vector of ranks for each of the samples.

   The test statistic and its chi-square approximation can be saved using the parameters STATISTIC and CHISQUARE respectively. The parameter DIFFERENCES can be used to save the differences between the cumulative distributions. The R1 and R2 parameters allow the ranks of the samples to be saved.


Options: PRINT, GROUPS.

Parameters: Y1, Y2, R1, R2, STATISTIC, CHISQUARE, DIFFERENCES.


Method

The Kolmogorov-Smirnoff two sample test is a test of the null hypothesis that the two samples arise from the same distribution, against the alternative that the underlying distributions are different. The test compares the two empirical cumulative distribution functions in order to try and detect differences in shape of the underlying distributions. The cumulative distribution functions S1 and S2 are formed by

Sk(X) = ( number of scores in sample kX ) / ( size of sample k )

for k=1,2; and a suitable set of points X. The procedure uses the set of values taken by one or other of the samples, i.e. {X: X is in DATA}. The maximum absolute difference

MD = max( abs { S1(X) - S2(X) } )

is used as the basis for significance tests. The chi-square approximation (2 degrees of freedom) to this statistic is CH:

CH = 4 × MD × MD × (n1×n2 / (n1+n2) )

where n1, n2 are the sizes of the samples. (See for example Siegel 1956, pages 127-136.)


Action with RESTRICT

The variates Y1 and Y2 can be restricted, and in different ways. KOLMOG2 uses only those units of each variate that are not excluded by their respective restrictions. Restrictions are also obeyed on Y1 and GROUPS, allowing RESTRICT to be used for example to limit the data to only two groups when the GROUPS factor has more than two levels.


Reference

Siegel, S. (1956). Nonparametric Statistics for the Behavioural Sciences. McGraw-Hill, New York.