SVSTRATIFIED procedure

Analyses stratified random surveys by expansion or ratio raising (S.D. Langton).


Options

PRINT = string
Controls printed output (summary, totals, means, influence, ratios, extra); default summ, tota, infl

PLOT = string
Controls which high-resolution graphs are plotted (single, separate); default * i.e. none

XMISSING = string
Action if x-variable contains missing values (estimate, fault); default esti

RESTRICTED = string
Action with restricted (or filtered) observations (omit, add); default omit

STRATUMFACTOR = factor
Stratification factor; default * i.e. unstratified

NINFLUENCE = scalar
Number of influential points to print; default 10

METHOD = string
Method for ratio analysis (separate, combined, classicalcombined); default sepa

SAVESUMMARY = string
Whether to save just the overall summaries instead of those for each stratum (yes, no); default no

COMBINEDSTRATUM = scalar
Stratum for which the ratio should be set to the combined ratio estimate; default *

ROWS = scalars
Number of rows of plot-matrix; default * i.e. set automatically depending on number of levels of STRATUMFACTOR

COLUMNS = scalars
Number of columns of plot-matrix; default * i.e. set automatically depending on number of levels of STRATUMFACTOR

NBOOT = scalar
Number of bootstrap samples to use; default 0

SEED = scalar
Seed for random number generator for bootstrap; default 0

CIPROBABILITY = scalars
The probability level for the confidence intervals; default 0.95

COMPACT = string
Whether to produce output in a compact (plaintext) format (yes, no); default no


Parameters

Y = variates
Response data

X = variates
Base data; if unset expansion raising is used

LABELS = variates, factors or texts
Structure for labelling influential points

NUNITS = tables, scalars or variates
Numbers of units in each stratum in the population

XTOTALS = tables, scalars or variates
Population totals of the base data in each stratum

TOTALS = tables or scalars
Saves total estimates

SETOTALS = tables or scalars
Saves standard errors of estimates

MEANS = tables or scalars
Saves mean estimates

SEMEANS = tables or scalars
Saves standard errors of mean estimates

RATIOS = tables
Saves estimates of ratios

FITTEDVALUES = variates
Saves fitted values for the observations

INFLUENCE = variates
Saves influence statistics

LTOTALS = tables or scalars
Saves lower confidence limit for total

UTOTALS = tables or scalars
Saves upper confidence limit for total

LMEANS = tables or scalars
Saves lower confidence limit for mean

UMEANS = tables or scalars
Saves upper confidence limit for mean

VARIANCES = tables or scalars
Saves residual variances in each stratum


Description

SVSTRATIFIED analyses the results from a stratified random survey, either by expansion or ratio raising, and allows detection of outliers. The sample data are supplied, in a variate, using the Y parameter. Similarly the base data are provided using the X parameter. The LABELS parameter can supply a variate, factor or text for labelling individual units in the output. If X is unset or missing, expansion raising is used (i.e. the usual stratified random sampling analysis) but within a stratum units must either all have base data or all lack it. (Note: stratum is used here in the survey sense, not as in the ANOVA directive: i.e. the units are assumed to be classified into groups, and each group is called a stratum.) If option XMISSING is set to fault, any missing base data will cause a fault.

   The vectors Y, X and LABELS should usually have one row for each unit in the survey population, with unsampled or non-responding units having a missing value in the Y variate. However, if parameter NUNITS is set, the Y variate may contain only the response data; NUNITS then supplies the information about the number of units in each stratum in the full population. Similarly, if ratio estimation is required, XTOTALS should contain the population totals of X in each stratum.

   The METHOD specifies which method of ratio estimation to use. The setting separate estimates a ratio for each stratum, whereas settings combined and classicalcombined assume a common ratio in all strata. The classicalcombined method follows the approach shown in most textbooks, where the estimate for a stratum is given by ∑X × ratio where the summation is over all units in the stratum. This approach can produce illogical estimates in some situations (e.g. the estimate may be less than the sum of the responses) and so the combined method estimates only for the unobserved units and adds this to the sum of the observed responses in the stratum, i.e. ∑Y + ∑X × ratio where the summation of Y is over sampled (or responding) units and the summation of X is over unsampled units. Option COMBINEDSTRATUM is used with the separate ratio method and allows the ratio in a particular stratum to be reset to the combined ratio value; this can be a useful technique for dealing with the extreme ratios sometimes produced when the sampling fraction in a stratum is very low.

   Printing is controlled via the PRINT option. The default settings are summary, totals and influence; these print a summary of the data, estimated totals and influence statistics, respectively. The setting means produces a table showing the estimated means, whilst ratio produces a low-resolution plot of the confidence limits for the ratio estimates; this can be useful when deciding whether a combined ratio estimate is to be used. The setting extra displays extra information relating to the analysis, including sums and means of the response data and raising factors (weights).

   The CIPROBABILITY option sets the probability level used in calculation of confidence limits for means and totals. Option NINFLUENCE controls the number of points of high influence printed. Option COMPACT can be used to switch to a compact, plain-text style for the output, designed for printing concise summaries of an analysis. When COMPACT=yes, the information printed depends on the width of the first output channel, with more information being displayed when this can be done without splitting tables.

   By default all standard errors and confidence limits are calculated using the conventional approximations. Alternatively, bootstrap methods may be used by setting the NBOOT option to the required number of bootstrap samples. In the case of ratio estimation, the samples are used to form bootstrap estimates of the ratio, which are then applied to the known population totals for X. Bootstrapping is carried out independently in each stratum, using the method described by Sarndal et al. (1992, page 442); this involves creating a "pseudopopulation" containing n replicates of each observation, where n is nearest integer to the expansion raising factor (inverse of inclusion probability) for the stratum. Bootstrap samples of the same size as the original sample are then taken from the pseudopopulation and used to compute the estimates. The SEED option specifies the seed to use in the random number generator used to construct the bootstrap samples. The default value of zero continues an existing sequence of random numbers or, if the generator has not yet been used in this run of GenStat, it initializes the generator automatically.

   Graphical output is available by setting the PLOT option. The setting single produces a single plot of the response data against X or against the stratum number if X is unset. A fitted line is shown if one of the combined ratio methods is used. The separate setting produces one graph for each stratum, with up to six graphs on each screen. All graphs are plotted on the log scale.

   Output can be saved using the parameters TOTALS, SETOTALS, MEANS, SEMEANS, LTOTALS, UTOTALS, LMEANS and UMEANS. These are generally set to a table classified by the stratification factor but, if option SAVESUMMARY=yes, then they save scalars containing only the grand total summed over all strata. Ratios can be saved in a table using the RATIOS parameter, whilst the residual variances in each stratum can be saved using VARIANCES; the latter are useful for working out optimal allocation strategies for future surveys. Fitted values and influence statistics may be saved using parameters FITTEDVALUES and INFLUENCE. The fitted values are the X value multiplied by the appropriate ratio for each unit or, where expansion raising is used, the mean Y value for the stratum.

 

Options: PRINT, PLOT, XMISSING, RESTRICTED, STRATUMFACTOR, NINFLUENCE, METHOD, SAVESUMMARY, COMBINEDSTRATUM, ROWS, COLUMNS, NBOOT, SEED, CIPROBABILITY, COMPACT.

Parameters: Y, X, LABELS, NUNITS, XTOTALS, TOTALS, SETOTALS, MEANS, SEMEANS, RATIOS, FITTEDVALUES, INFLUENCE, LTOTALS, UTOTALS, LMEANS, UMEANS, VARIANCES.


Method

The methods used are described in most survey analysis textbooks; see for example, Sampford (1962) or Lehtonen & Pahkinen (1994). Most calculations are carried out using GenStat table structures.


Action with RESTRICT

The action with RESTRICT depends of the setting of the RESTRICTED option. By default restricted units are totally excluded from the analysis. If RESTRICTED is set to add, restricted observations are excluded from the ratio calculations but then added back into the total estimates; this is a technique for dealing with nonrepresentative outliers (see e.g. Lee, 1995), which are believed to be genuine observations but are not representative of the wider population.


References

Lee, H. (1995). Outliers in Business Surveys. Chapter 26 of Business Survey Methods (ed. Cox, Binder, Hinnappa, Christianson, Colledge & Kott). Wiley, New York.

Lehtonen, R. & Pahkinen, E.J. (1994). Practical Methods for Design and Analysis of Complex Surveys. Wiley, New York.

Sampford, M.R. (1962). An introduction to Sampling Theory. Oliver & Boyd, London.