SVTABULATE procedure
Tabulates data from random surveys, including multistage surveys and surveys with unequal probabilities of selection (S.D. Langton).
Options
Parameters
Description
SVTABULATE procedure calculates estimates from surveys, together with the correct asymptotic standard errors, allowing for the design of the survey. In particular, information about the numbers of sampling units in the survey population is needed and this can be supplied in one of three ways.
1. The WEIGHTS option can be used to supply weights which will generally be the inverse of the probability of selection (pi expansion weights, Sarndal et al. 1992). This is simple, but cannot convey the full design information for multi-stage surveys.
2. The option NUNITS can be used to list the number of primary sampling units per stratum using a table or variate with one value for each stratum. Similarly, in a two-stage design, NSECONDARYUNITS indicates the number of secondary units in each primary sampling unit.
3. The dataset can contain the full survey population with unsampled (or non-responding) units indicated by missing values for the response variables. This allows GenStat to deduce the numbers of units without the need to supply any further information; it is thus simple to use, but is not feasible with large or complex surveys. The NUNITS (and NSECONDARYUNITS if appropriate) option should be set to a value of -1 to indicate that this is required.
Other information on the survey design is provided using the STRATUMFACTOR and SAMPLINGUNITS options.
The response variable is specified using the Y parameter. Estimated counts of the number of observations can be produced by leaving the parameter unset (this is equivalent to analysing a vector of 1's). The Y parameter can also be left unset if the procedure is used to calculate survey weights. The X parameter can be set in order to produce estimates of the ratio Y/X. By default estimates of totals, means or ratios are for the whole population, but the CLASSIFICATION option can be set to one or more factors defining subsets of the data for which estimates are required.
The FITTEDVALUES parameter is used when estimating population totals via a model-assisted approach. Variance estimates are then calculated using the residual deviation about the fitted values. This can be used in conjunction with the SVCALIBRATE procedure to provide estimates following calibration weighting.
Output is controlled by the PRINT and PLOT options. The latter produces various plots that are useful in identifying outliers and influential points which may require further investigation. The setting single of the PLOT option produces a scatterplot of values of Y against X, whilst separate produces a separate graph for each combination of levels of the CLASSIFICATION factors. When X is unset, both single and separate produce a scatterplot of Y against CLASSIFACTION. The weights and influence settings produce histograms of the weights and influence statistics respectively. The setting diagnostic produces a scatterplot of influence statistics against weights; this plot tends to be more informative than the histograms with large datasets. The influence statistic for an observation is defined as the absolute percentage change in the total estimate when the observation is replaced by a missing value and the associated weight redistributed to other units in the same stratum. When PRINT is set to influence, details are printed of the observations with the highest influence; the number printed can be controlled by the NINFLUENCE option. By default this output is labelled by the row number of the observation, but the LABELS parameter can be used to specify more meaningful identifiers in the form of a variate, text or factor.
The FPCOMIT option is provided so that the finite population correction (see e.g. Sarndal et al. 1992) can be omitted. This is usually done when a simplified variance estimate is produced for multistage samples by ignoring the within-cluster component of variation (the ultimate cluster approach); since this is non-conservative, the omission of the FPC is sometimes advocated to counteract this and to ensure that standard errors are appropriate. GenStat will produce the ultimate cluster results if it is only provided with the survey weights (i.e. NUNITS and NSECONDARYUNITS left unset), but this approach is not recommended since the correct analysis can be produced with little extra effort.
Results of the analysis can be saved using the parameters TOTALS, MEANS and RATIOS, with the corresponding standard errors using SETOTALS, SEMEANS and SERATIOS. When the Y parameter is unset, TOTALS and SETOTALS contain estimated counts of observations. Numbers of (non-missing) observations and the sum of the weights can be saved using the NOBSERVATIONS and SUMWEIGHTS parameters. These are set to tables classified by the CLASSIFICATION factors; if CLASSIFICATION is unset, they are they are set to a table with a single cell labelled 'All data'. The OUTWEIGHTS and INFLUENCE parameters allow you to save variates containing the weights and influences, respectively.
Options: PRINT, PLOT, STRATUMFACTOR, NUNITS, SAMPLINGUNITS, NSECONDARYUNITS, CLASSIFICATION, NINFLUENCE, WEIGHTS, FPCOMIT.
Parameters: Y, X, LABELS, OUTWEIGHTS, TOTALS, SETOTALS, MEANS, SEMEANS, RATIOS, SERATIOS, NOBSERVATIONS, SUMWEIGHTS, FITTEDVALUES, INFLUENCE.
Method
The procedure uses the methods for survey analysis described in most survey analysis textbooks; Sarndal et al. (1992) give the best account of these for the case where weights vary within a stratum or sampling unit. If the dataset contains the full population, as opposed to just sampled or responding units, the options NUNITS and/or NSECONDARYUNITS can be set to -1, in which case the procedure calculates the numbers using TABULATE.
Action with
RESTRICT
Restrictions of the Y variate or any of the CLASSIFICATION factors are used to define a subpopulation, and the estimates produced relate to that subpopulation. Any restrictions on SAMPLINGUNITS, STRATUMFACTOR or WEIGHTS are ignored.
References
Lehtonen, R. & Pahkinen, E.J. (1994). Practical Methods for Design and Analysis of Complex Surveys. Wiley, Chichester.
Sarndal, C., Swenssion, B. & Wretman, J. (1992). Model Assisted Survey Sampling. Springer-Verlag, New York.