SPEARMAN procedure

Calculates Spearman's Rank Correlation Coefficient (S.J. Welham, N.M. Maclaren & H.R. Simpson).


Options

PRINT = strings
Output required (test, correlations, ranks): test produces the correlation coefficient/matrix and relevant test statistics, correlations prints out just the correlation coefficients for each pair of variates; ranks produces the vectors of ranks for each sample; default test

GROUPS = factor
Defines the sample membership if only one variate is specified by DATA

CORRELATION = scalar or symmetric matrix
Scalar to save the rank correlation coefficient if there are two samples, or symmetric matrix to save the coeficients between all pairs of samples if there are several

T = scalar or symmetric matrix
Scalar to save the Student's t approximation to the correlation coefficient if there are two samples, or symmetric matrix to save the t approximations for all pairs of samples if there are several (calculated only if the sample size is 8 or more)

DF = scalars
Scalar to save the degrees of freedom for each t statistic


Parameters

DATA = variates
List of variates containing the data for each sample, or a single variate containing the data from all the samples (the GROUPS option must then be set to indicate the sample to which each unit belongs)

RANKS = variates
Saves the ranks


Description

SPEARMAN calculates Spearman's Rank Correlation Coefficient between pairs of samples. The samples can be stored in different variates and supplied in a list with the DATA pointer. Alternatively, they can all be placed in a single variate, and the GROUPS option set to a factor to indicate the sample to which each unit belongs. If the sample size is 8 or more (i.e. large enough for the approximation to be valid), the Student's t approximation is calculated. Otherwise SPEARMAN obtains significance levels from stored tables. The results can be displayed by use of the test setting of option PRINT, and saved using the options CORRELATION, T and DF. If more than two variates are specified, the full correlation matrix between all pairs of variables will be formed. The PRINT setting ranks causes the vector of ranks for each sample to be printed and correlations means that only the correlations will be displayed. The ranks from each sample can be saved using the RANKS parameter.


Options: PRINT, GROUPS, CORRELATION, T, DF. Parameters: DATA, RANKS.


Method

Spearman's Rank Correlation Coefficient is a measure of association between the rankings of two variables measured on N individuals (i.e. two vectors of length N). The correlation coefficient is calculated from the two vectors of ranks for the samples: let { Xi ; i=1...N } and { Yi ; i=1...N } be the vectors of ranks for sample 1 and sample 2 respectively, then the coefficient r is based on the vector of differences between ranks: { Di = Xi - Yi ; i=1...N } and is calculated by

r = 1 - 6 × ∑ i=1...N Di2 / [ N(N2-1) ].

If ties are present, then the statistic will be biased, and must be recalculated taking account of ties by:

r = ( ∑Xi2 + ∑Yi2 - ∑Di2 ) / ( 2 × √( ∑Xi2 × ∑Yi2 ) )

where ∑Xi2 = (N3-N)/12 - Tx ;

Yi2 = (N3-N)/12 - Ty ;

Tk = ∑ ( tj3 - tj )/12

and tj is the number of observations in the group with rank j.

   The t-approximation for this statistic, T, is valid for samples of size 8 upwards, and is calculated by

T = r × √[ (N-2)/(1-r2) ].

It has approximately a t-distribution on N-2 degrees of freedom, and can be used for a test of the null hypothesis of independance between samples. (See for example Siegel 1956, pages 202-213.)

   Exact critical values for sample sizes of 4-50 are given by Siegel & Castellan (1988) Table Q.


Action with RESTRICT

If any of the variates in DATA is restricted, the statistic is calculated only for the set of units not excluded by the restriction.


References

Siegel, S. (1956). Nonparametric Statistics for the Behavioural Sciences. McGraw-Hill, New York.

Siegel, S. & Castellan, N.J. (1988). Nonparametric Statictics for the Behavioural Sciences (second edition). McGraw-Hill, New York.