MANNWHITNEY procedure
Performs a Mann-Whitney U test (S.J. Welham, N.M. Maclaren & H.R. Simpson).
Options
Parameters
Description
The Mann-Whitney U test is a test for differences in location between two samples. The data for the samples can be stored in two separate variates, and supplied by the parameters Y1 and Y2. Alternatively, they can be stored in a single variate, supplied by Y1, with the GROUPS option set to a factor to identify which unit belongs to each sample. The GROUPS option is ignored when the Y2 parameter is set.
MANNWHITNEY calculates the test statistic U, along with its its associated probability value. An exact probability is calculated (using procedure PRMANNWHITNEYU) if the size of either sample is less than 51 and the statistic U is less than 10000; otherwise a Normal approximation is used. The statistic and the probability can be saved using the STATISTIC and PROBABILITY parameters respectively. Parameter SIGN holds an indicator which takes the value 1 if the ranks in the first sample are higher on average than those in the second sample, and takes the value 0 otherwise. The ranks (with respect to the combined data set) for each sample can be saved using the R1 and R2 parameters.
Printed output is controlled by the PRINT option, with settings
The probability for the confidence limits is specified by the CIPROBABILITY option; the default, of 0.95, gives a 95% interval. The lower and upper confidence values can be saved by the LOWER and UPPER parameters, respectively. The calculation of the interval may be slow when there are ties amongst the values, as essentially MANNWHITNEY then has to invert the probability function.
By default a two-sided test is done (to assess that samples are unequal) but the METHOD option can be set to greaterthan to test that the first sample is greater than the than the second, or lessthan to test that it is smaller.
Options: PRINT, METHOD, GROUPS, CIPROBABILITY.
Parameters: Y1, Y2, R1, R2, STATISTIC, PROBABILITY, SIGN, LOWER, UPPER.
Method
The Mann-Whitney (or Wilcoxon) U-test is a two-sample test of location difference: i.e. a test of the null hypothesis that the two samples arise from distributions with the same mean vs. the alternative that the distribution means differ.
The test statistic U is formed using ranks found from the combined data set, and is taken to be the smaller of U1 and U2, where
Uk = n1 × n2 + nk × (nk+1) / 2 - Rk ; k=1,2
and nk is the size of sample k, Rk is the sum of ranks for sample k. This score Uk can be interpreted as the number of times a rank score in the other sample precedes a score in sample k in the ranking. So the sample with the lowest score has, on average, smaller rank scores.
The PRMANNWHITNEYU procedure is used to calculate exact values of the probability for the test statistic when the size of either sample is less than 51 and the statistic U is less than 10000; otherwise a Normal approximation is used:
Normal = ( n1 × n2 / 2 - U ) / √{ n1 × n2 × ( n1+n2+1 ) / 12 }
If ties are present, the standard error of the Normal approximation (i.e. the denominator) must be calculated by:
√{ n1 × n2 / (N × (N-1)) × ( (N3-N) / 12 - ∑k Tk ) }
where Tk = ( tk3-tk )/12 and tk is the number of observations with rank k. (See for example Siegel 1956, pages 116-127.)
Action with
RESTRICT
The variates Y1 and Y2 can be restricted, and in different ways. MANNWHITNEY uses only those units of each variate that are not excluded by their respective restrictions. Restrictions are also obeyed on Y1 and GROUPS, allowing RESTRICT to be used for example to limit the data to only two groups when the GROUPS factor has more than two levels.
Reference
Siegel, S. (1956). Nonparametric Statistics for the Behavioural Sciences. McGraw-Hill, New York.