WADLEY procedure

Fits models for Wadley's problem, allowing alternative links and errors (D.M. Smith).


Options

PRINT = strings
Controls printed output (deviance, estimates, correlations, monitoring); default devi, esti

DISTRIBUTION = string
Distribution of the response variate (poisson, negativebinomial, qlnegativebinomial, qlscaledpoisson); default pois

LINK = string
Link transformation (logit, probit, complementaryloglog, cauchit); default logi

TERMS = formula
Model to be fitted

CONTROL = factor
Factor to distinguish the control, or zero, dose (level 1) from the other treatments (level 2)

MAXIMAL = factor
Factor to define the maximal model i.e. with a level for every combination of values of the variates and factors in TERMS

RMETHOD = string
Type of residuals to be formed (deviance, Pearson); default devi


Parameters

Y = variates
Response variate for each fit

RESIDUALS = variates
Variate to save the residuals from each fit

FITTEDVALUES = variates
Variate to save the fitted values from each fit


Description

WADLEY uses the generalized linear models methodology of composite link functions to fit a range of models for the situation known as Wadley's problem. This arises in bioassay where it is possible to count only the number of subjects that have not responded to a particular dose of a drug or stimulus. For example, with eggs of insects fumigated in grain, it is generally possible to count only those that survive and hatch.

   By default, the analysis assumes that the numbers of subjects that are treated in each observation follow a Poisson distribution with a common mean parameter; other distributions can be specified using the DISTRIBUTION option or, for user-defined distributions, by providing subsidiary procedure WADDISTRIBUTION (see details of the procedures called by WADLEY).

   The analysis estimates the mean of the distribution, and then fits the dose response curve as in an ordinary probit analysis. The LINK option defines the transformation (logit, probit, cauchit, or complementary log-log) required to make the model additive. User-defined transformations can also be specified, by leaving LINK unset and providing subsidiary procedure WADLINK to calculate the necessary fitted values and derivatives, and WADINITIAL to calculate initial values for the linear predictor (see details of the procedures called by WADLEY). The model to be fitted is defined by the TERMS option.

   To assist the estimation of the expected total number of subjects, there must be some control observations - for example with zero doses of fumigant. These must be identified by a factor, specified by the CONTROL option, with level 1 for untreated and level 2 for treated. The comparison between the treated and untreated levels of CONTROL must not be aliased with any of the variates and factors in TERMS. (Thus if, for example, TERMS contained a factor representing different types of drug, this must not have a separate level for the untreated observations.)

   Often with these sort of data, it is found that the variability exceeds that which would be expected from the distribution assumed for the data. To estimate the amount of overdispersion, the MAXIMAL option must be set to a factor with a different level for every combination of values of the factors and variates in the TERMS model.


Options: PRINT, DISTRIBUTION, LINK, TERMS, CONTROL, MAXIMAL, RMETHOD.

Parameters: Y, RESIDUALS, FITTEDVALUES.


Method

In essence WADLEY is a specific application of the use of composite link functions in generalized linear models. The actual methods used are those in the GenStat procedure GLM (Lane 1989) and the GLIM macros of Smith & Morgan (1989). The procedure is very similar in spirit to these GLIM macros, and it is recommended that this reference be consulted for further information. However, there are some extensions. The capability to handle user-defined links and distributions has been added. Also, the range of distributions has been extended to include two forms of quasi-likelihood, namely that where the weighting is of negative binomial form (weight=1/(1+hf×fittedvalues)), and that where the weighting is of scaled Poisson form (weight=1/hf), where hf is the heterogeneity factor. If the estimated heterogeneity factor is less than zero in the negative binomial cases, or if it is less than one in the scaled Poisson case, it is set to zero or one respectively.

   WADLEY has two subsidiary procedures, WADCODI and WADFIT, to assist with the analysis; neither of these need be modified by the user:

WADCODI prints the results of the iterative processes;

WADFIT performs the iterative model fits.

   There are also three other procedures, which can be rewritten or replaced, to cater for further user-defined distributions and links:

WADDISTRIBUTION calculates the variance function and deviance for a user-defined distribution;

WADINITIAL calculates initial estimates of the linear predictor for a user-defined link;

WADLINK calculates the fitted values and derivatives for a user-defined link.

   If the DISTRIBUTION option is unset, the procedure will call WADDISTRIBUTION instead of using one of the various standard distributions. For a Poisson error distribution WADDISTRIBUTION should be defined like this.

  PROCEDURE 'WADDISTRIBUTION'

            "Calculation of variance function and deviance"

  PARAMETER 'Y', "Input: variate; response variate"\

            'FITTED', "Input: variate; fitted values"\

            'VARIANCE',"Output: variate; variance"\

            'LL', "Output: variate; log likelihood variate"\

            'DEVIANCE';"Output: scalar; total deviance"\

            MODE=p

            SCALAR two; VALUE=2

            CALCULATE VARIANCE = FITTED

            & LL = Y*LOG(Y/FITTED)-Y+FITTED

            & DEVIANCE = two*SUM(LL)

  ENDPROC

For other error distributions only the three CALCULATE statements need to be changed.

   Similarly, for option LINK unset, WADINITIAL and WADLINK will be called. For a logit link WADINITIAL would be defined as follows.

  PROCEDURE 'WADINITIAL'

            "Calculation of initial estimates of linear predictor"

  PARAMETER 'Y', "Input: variate; response variate"\

            'LP', "Output: variate; linear predictor"\

            'IND', "Input: variate; marker variate with value 1

                     for a control observation, 0 otherwise"\

            'MAXY'; "Inout: scalar; estimate of asymptote"\

            MODE=p

            SCALAR half,one; VALUE=0.5,1

            CALCULATE LP = IND*LOG(MAXY/(Y+half)-one)

  ENDPROC

For other links only the CALCULATE statement need be changed so, for example, a probit link would require the statement

CALCULATE LP = IND*NED(one-(Y+one)/MAXY)

For a logit link WADLINK would be

  PROCEDURE 'WADLINK'

            "Calculation of fitted values and derivatives

            of the link function given the linear predictor"

  PARAMETER 'LP', "Input: variate; linear predictor"\

            'IND', "Input: variate; marker variate with value 1

                     for a control observation, 0 otherwise"\

            'TA', "Output: variate; estimate of fitted values"\

            'TB', "Output: variate; estimate of derivatives"\

            'MAXY'; "Input: scalar; estimate of asymptote"\

            MODE=p

SCALAR half,one; VALUE=0.5,1

            CALCULATE TA = (.NOT.IND)+IND/(one+EXP(LP))

            & TB = MAXY*EXP(LP)*TA*TA

  ENDPROC

For other links only the CALCULATE statements need to be changed so, for example, a probit link would require

CALCULATE TA = (.NOT.IND)+IND/(one-NORMAL(LP))

& TB = MAXY*EXP(-half*LP*LP)/ROOT2PI

where ROOT2PI is a scalar with the value of the square root of 2π. The marker variate IND identifies which is the control and non control data, so TA should always be of the form

TA = (.NOT.IND)+IND*function

where function is the link function for the non-control part of the data. The variate TB should always be of the form

TB = MAXY*deriv_fn

where deriv_fn is the derivative of the link function with respect to the linear predictor (LP).

   If LINK or DISTRIBUTION are unset, but no user routines are given for WADINITIAL, WADLINK and WADDISTRIBUTION, then those given here (for logit link and Poisson error distribution) will be used.

   A debt is owned to Dr J. Parrott of Pfizer Central Research, Sandwich, UK for his support and encouragement of this work.


Action with RESTRICT

If the Y-variate is restricted, only the specified subset of the units will be included in the analysis.


References

Lane, P.W. (1989). Procedure GLM. In: Genstat Procedure Library Release 1.3[2] (ed. R.W.Payne & G.M.Arnold), 80-82.

Smith, D.M. & Morgan, B.J.T. (1989). Extended models for Wadley's Problem. Glim Newsletter, 18, 21-28.