SVBOOT procedure
Bootstraps data from random surveys (S.D. Langton).
Options
Parameters
Description
SVBOOT forms a single bootstrap sample using data from a stratified one- or two-stage survey. It is designed to be used in a FOR loop, with a new sample being formed and analysed each time that the loop is executed. The DATA parameter supplies a list of structures to be bootstrapped, whilst BOOT contains the corresponding bootstrapped structures. Alternatively, the SAVEUNITS option can be used to save the units in the bootstrapped samples, allowing the bootstrapped structures to be formed by a CALCULATE statement. Options STRATUMFACTOR and SAMPLINGUNITS supply the stratification factor and the sampling units respectively, whilst survey weights are supplied by the WEIGHTS option.
When option METHOD=simple, sampling is with replacement within each stratum. This is the correct approach for an infinite population, but will give reasonable results as long as sampling proportion is not very high. METHOD=sarndal uses the method described by Sarndal et al. (1992, page 442), as implemented by Grilli & Pratesi (2004), in which an artificial population is created, containing each element of the sample w times, where w is the survey weight (the inverse of the probability of inclusion), rounded to the nearest integer. Sampling is then carried out without replacement (not with replacement as Sarndal recommends). For two-stage sampling WEIGHTS should be set to a list of two variates, the first giving the overall sampling weights and the second the weights at the first stage only (typically the inverse of the probability of selection of the primary sampling units).
The Sarndal approach works well as long as either the weights are integers, or they are large enough that the effect of rounding is negligible. For surveys with high sampling fractions, METHOD=random implements a variant on the Sarndal method in which the artificial population is formed by a random process, using resampling in proportion to the weights and ensuring that each observation is present at least once in the population. Care must be taken when using this method, as means, totals and other statistics will vary slightly between the different artificial populations. With this method it may sometimes be helpful to form repeated bootstrap samples from the same pseudo-population; this can be achieved by means of the POPULATION option.
Except in simple surveys with no restrictions, the number of units in each bootstrapped sample will not be the same as the original survey and so options BSTRATUMFACTOR and BSAMPLINGUNITS save new factors for use with the bootstrapped structures.
Options: PRINT, SEED, STRATUMFACTOR, SAMPLINGUNITS, WEIGHTS, METHOD, POPULATION, SAVEUNITS, BSTRATUMFACTOR, BSAMPLINGUNITS.
Parameters: DATA, BOOT.
Method
a) simple, one-stage
A new variate is formed for each stratum containing the unit numbers associated with each stratum, indexed by a grouping factor. The new bootstrap sample is then formed by selecting from these at random with replacement. Any weights set are ignored. The new samples are in stratum order, rather than the order of the original dataset.
b) simple, two-stage
The method described above is applied twice, once to select primary sampling units at random from those in the stratum, and once to select secondary sampling units from those in the appropriate psu.
c) Sarndal, one-stage
An artificial population is generated for each stratum, with each unit being replicated w times, where w is the appropriate weight, rounded to the nearest integer. Sampling is then carried out, without replacement, using the inverse of the weights as inclusion probabilities. For reasons of computational simplicity, the bootstrap sample sizes are not fixed, and will therefore differ slightly from the one in the original sample.
d) Sarndal, two-stage
The method described above is applied twice, once to select primary sampling units at random from those in the stratum, and once to select secondary sampling units from those in the appropriate psu.
e) Random
This method is designed as an alternative to the Sarndal method when the sampling fraction is very high, so that the rounded weights are equal to one and the same sample is always generated. The pseudo-population is formed by including each of the sampled observations once and then resampling with replacement from the sampled observations to generate the remaining N-n units in the pseudo-population (where N is the population size, and n is the sample size in the stratum). This method is currently only implemented for one stage sampling with equal weights in a stratum. The pseudo-population is then sampled without replacement, as in the Sarndal method.
Action with
RESTRICT
Restricted units are excluded from the bootstrapping process and do not occur in the resampled dataset The restriction is defined by the first variate in the DATA list, if this is set.
References
Grilli, L. & Pratesi, M. (2004). Weighted estimation in multilevel ordinal and binary models in the presence of informative sampling designs. Survey Methodology, 30, 93-103.
Sarndal, C., Swensson, B. & Wretman, J. (1992). Model Assisted Survey Sampling. Springer-Verlag, New York.