MANNWHITNEY procedure

Performs a Mann-Whitney U test (S.J. Welham, N.M. Maclaren & H.R. Simpson).


Options

PRINT = strings
Output required (test, ranks, confidence); default test

METHOD = string
Type of test required (twosided, greaterthan, lessthan); default twos

GROUPS = factor
Defines the samples for a two-sample test if the Y2 parameter is not set

CIPROBABILITY = scalar
Probability for the confidence interval for the median difference between the samples; default 0.95


Parameters

Y1 = variates
Identifier of the variate holding the first sample if Y2 is set, or both samples if Y2 is unset (the GROUPS option must then also be set)

Y2 = variates
Identifier of the variate holding the second sample

R1 = variates
Saves the ranks of the first sample if Y2 is set, or both samples if Y2 is unset

R2 = variates
Saves the ranks of the second sample if Y2 is set

STATISTIC = scalars
Scalar to save the test statistic U

PROBABILITY = scalars
Probability value for the test statistic

SIGN = scalars
Scalar to save an indicator: 1 if the first sample scores the highest ranks on average, 0 otherwise

LOWER = scalars
Saves the lower confidence value for median difference between the samples

UPPER = scalars
Saves the upper confidence value for median difference between the samples


Description

The Mann-Whitney U test is a test for differences in location between two samples. The data for the samples can be stored in two separate variates, and supplied by the parameters Y1 and Y2. Alternatively, they can be stored in a single variate, supplied by Y1, with the GROUPS option set to a factor to identify which unit belongs to each sample. The GROUPS option is ignored when the Y2 parameter is set.

   MANNWHITNEY calculates the test statistic U, along with its its associated probability value. An exact probability is calculated (using procedure PRMANNWHITNEYU) if the size of either sample is less than 51 and the statistic U is less than 10000; otherwise a Normal approximation is used. The statistic and the probability can be saved using the STATISTIC and PROBABILITY parameters respectively. Parameter SIGN holds an indicator which takes the value 1 if the ranks in the first sample are higher on average than those in the second sample, and takes the value 0 otherwise. The ranks (with respect to the combined data set) for each sample can be saved using the R1 and R2 parameters.

   Printed output is controlled by the PRINT option, with settings

    test
test statistic and probability,

    ranks
ranks (with respect to the whole data set) for each sample, and

    confidence
median difference between the samples, with confidence limits.

   The probability for the confidence limits is specified by the CIPROBABILITY option; the default, of 0.95, gives a 95% interval. The lower and upper confidence values can be saved by the LOWER and UPPER parameters, respectively. The calculation of the interval may be slow when there are ties amongst the values, as essentially MANNWHITNEY then has to invert the probability function.

   By default a two-sided test is done (to assess that samples are unequal) but the METHOD option can be set to greaterthan to test that the first sample is greater than the than the second, or lessthan to test that it is smaller.


Options: PRINT, METHOD, GROUPS, CIPROBABILITY.

Parameters: Y1, Y2, R1, R2, STATISTIC, PROBABILITY, SIGN, LOWER, UPPER.


Method

The Mann-Whitney (or Wilcoxon) U-test is a two-sample test of location difference: i.e. a test of the null hypothesis that the two samples arise from distributions with the same mean vs. the alternative that the distribution means differ.

   The test statistic U is formed using ranks found from the combined data set, and is taken to be the smaller of U1 and U2, where

Uk = n1 × n2 + nk × (nk+1) / 2 - Rk ; k=1,2

and nk is the size of sample k, Rk is the sum of ranks for sample k. This score Uk can be interpreted as the number of times a rank score in the other sample precedes a score in sample k in the ranking. So the sample with the lowest score has, on average, smaller rank scores.

   The PRMANNWHITNEYU procedure is used to calculate exact values of the probability for the test statistic when the size of either sample is less than 51 and the statistic U is less than 10000; otherwise a Normal approximation is used:

Normal = ( n1 × n2 / 2 - U ) / √{ n1 × n2 × ( n1+n2+1 ) / 12 }

If ties are present, the standard error of the Normal approximation (i.e. the denominator) must be calculated by:

√{ n1 × n2 / (N × (N-1)) × ( (N3-N) / 12 - ∑k Tk ) }

where Tk = ( tk3-tk )/12 and tk is the number of observations with rank k. (See for example Siegel 1956, pages 116-127.)


Action with RESTRICT

The variates Y1 and Y2 can be restricted, and in different ways. MANNWHITNEY uses only those units of each variate that are not excluded by their respective restrictions. Restrictions are also obeyed on Y1 and GROUPS, allowing RESTRICT to be used for example to limit the data to only two groups when the GROUPS factor has more than two levels.


Reference

Siegel, S. (1956). Nonparametric Statistics for the Behavioural Sciences. McGraw-Hill, New York.