KCROSSVALIDATION procedure
Computes cross validation statistics for punctual kriging (D.A. Murray & R. Webster).
Options
Parameters
Description
In geostatistics one way of choosing between plausible models for variograms is to use them for kriging, and see how well the kriging predicts the true values. The observed value of z at each sampling point in the data is omitted in turn from the whole set and predicted from the others. The predictions are compared with the true values to give a mean deviation or error, and the kriging variances are compared with the squared deviations to give a mean squared deviation ratio. This process is known as "cross-validation". The procedure KCROSSVALIDATION uses this principle of leave-one-out cross-validation.
The data are supplied, by the DATA parameter, in one of the two forms as for the KRIGE directive: i.e. in a matrix for data on a regular grid, or as a variate for irregularly scattered data together with the X and Y options set to variates to supply the spatial coordinates.
By default all data are considered when forming the kriging system. However, you may select a subset of the data by limiting the area to a rectangle defined by XOUTER and YOUTER options. Each of these should be set to a variate with two values to define lower and upper limits in the x (East-West) and y (North-South) directions respectively.
The minimum and maximum number of points for the kriging system are set by the MINPOINTS and MAXPOINTS options. There is a minimum limit of 3 for MINPOINTS and a maximum of 40 for MAXPOINTS, and MINPOINTS must be less than or equal to MAXPOINTS. The defaults are 7 and 20 respectively. You may select data points around the point to be kriged by setting the RADIUS option to the radius within which they must lie. If the variogram is anisotropic, the search may be requested to be anisotropic by setting option SEARCH to anisotropic; by default SEARCH=isotropic.
Further options are available for regular data. You can invoke universal kriging by setting the DRIFT option to linear or to quadratic, i.e. to be of order 1 or 2 respectively. By default is DRIFT=constant, to give ordinary kriging. If the grid is not square, the ratio of the spacing in the y direction to that in the x direction is given by the YXRATIO option. The default is 1.0 for square.
The variogram is specified by its type and parameters, as follows. The MODEL option may be defined to be set to either power, boundedlinear (one dimension only), circular, spherical, doublespherical, pentaspherical, exponential, besselk1 (Whittle's function), gaussian, cubic, stable (i.e. powered exponential; see Webster & Oliver 2001) or cardinalsine. All models may have a nugget variance, supplied using the NUGGET option; this is the constant estimated by MVARIOGRAM. You can specify the variance of any measurement error using the MEASUREMENTERROR parameter. The parameters of the power function (the only unbounded model) are defined by the GRADIENT and EXPONENT parameters. The parameter for the power of the stable model is supplied using the EXPONENT parameter. The simple bounded models (i.e. all other settings of MODEL except doublespherical) require the SILLVARIANCES (the sill of the correlated variance) and RANGES parameters. The latter is strictly the correlation range of the boundedlinear, circular, spherical and pentaspherical models, while for the asymptotic models it is the distance parameter of the model. The doublespherical model requires SILLVARIANCES and RANGES to be set to variates of length two, to correspond to the two components of the model.
The ISOTROPY parameter allows the variation to be defined to be either isotropic or anisotropic in one of two ways: either Burgess anisotropy (Burgess & Webster 1980) or geometric anisotropy (Webster & Oliver 1990). The anisotropy is specified by three parameters, namely PHI the angle in radians of the direction of maximum variation, RMAX the maximum gradient of the model, and RMIN the minimum gradient. In the current release only the power function may be anisotropic.
The predictions (or estimates) and variances can be saved using the PREDICTIONS and VARIANCES parameters. The cross-validation statistics can be saved using the STATISTICS parameter.
The PRINT option can be set to statistics to print the cross validation statistics or correlation to print the correlation between the predicted and true values. The PLOT option can be used to produce a plot of the predicted values against the true values.
Options: PRINT, PLOT, Y, X, YOUTER, XOUTER, RADIUS, SEARCH, MINPOINTS, MAXPOINTS, DRIFT, YXRATIO, SAVE.
Parameters: DATA, ISOTROPY, MODEL, NUGGET, SILLVARIANCES, RANGES, GRADIENT, EXPONENT, PHI, RMAX, RMIN, MEASUREMENTERROR, PREDICTIONS, VARIANCES, STATISTICS.
Method
The mean error is given by
∑i=1...n { z(xi) - zhat(xi) } / n
the mean squared error is
∑i=1...n { z(xi) - zhat(xi) }2 / n
and the mean squared deviation ratio
∑i=1...n { (z(x_i) - zhat(xi) )2 / sig2(xi) } / n
Action with
RESTRICT
The vectors involved in the analysis may be restricted as for KRIGE.
References
Burgess, T.M. & Webster, R. (1980). Optimal interpolation and isarithmic mapping of soil properties. I. The semi-variogram and punctual kriging. Journal of Soil Science, 31, 315-331.
Webster, R. & Oliver, M.A. (1990). Statistical Methods in Soil and Land Resource Survey. Oxford University Press, Oxford.
Webster, R. & Oliver, M.A. (2001). Geostatistics for Environmental Scientists. Wiley, Chichester.