PROBITANALYSIS procedure

Fits probit models allowing for natural mortality and immunity (R.W. Payne).


Options

PRINT = strings
Printed output required (model, summary, estimates, correlations, fittedvalues, monitoring, effectivedoses); default mode, summ, esti, fitt

TRANSFORMATION = string
Transformation to be used (probit, logit, complementaryloglog); default prob

MORTALITY = string
Whether to estimate natural mortality (omit, estimate); default omit

IMMUNITY = string
Whether to estimate natural immunity (omit, estimate); default omit

GROUPS = factor
Defines groups for an analysis of parallelism; default * i.e. no groups

SEPARATE = strings
Which parameters (apart from intercept) should be estimated separately for different groups (slope, mortality, immunity); default * i.e. none

LD = scalar or variate
Effective (or lethal) doses to be estimated, other than 50

LOGBASE = string
Base of antilog transformation to be applied to LD's (ten, e); default * i.e. none

DISPERSION = scalar
Controls the use of a heterogeneity factor in the calculation of s.e.s etc; with the default of 1 no factor is used, a missing value * estimates the heterogeneity from the residual deviance

FITMETHOD = string
Method to use to fit the model (generalizednonlinear, nonlinear) default nonl for Wadley's problem, otherwise gene

MAXCYCLE = scalar
Maximum number of iterations for fitting the model; default 30


Parameters

Y = variates
Number of subjects responding in each batch

DOSE = variates
Dose received by each batch of subjects

NBINOMIAL = variates or factors
Variate specifying the number of subjects in each batch, or factor specifying groupings of the observations assumed to have equal expected total numbers of subjects in Wadley's problem; if omitted, assumes Wadleys's problem with all observations having the same expected total number of subjects

INITIAL = variates
Initial values for parameters

STEPLENGTHS = variates
Step lengths for parameters


Description

Probit analysis is a way of modelling the relationship between a stimulus, like a drug, and a quantal response (success/failure). It is assumed that for each subject, there is a certain level of dose of the stimulus below which it will unaffected, but above which it will respond. This level of dose, known as its tolerance, will vary from subject to subject within the population.

   For example, it is often assumed that the tolerance of houseflies to logarithm of the dose of an insecticide will follow a Normal distribution; so, if we were to plot the proportion of the population with each tolerance against log dose, we would obtain the familiar bell-shaped curve. Likewise, if we plot the probability that a randomly-selected individual will respond, against the logarithm of dose, we would obtain a sigmoid (S-shaped) curve limited below by zero and above by one. To make the relationship linear, it is usual to transform the y-axis either to probits or to Normal equivalent deviates. In GenStat

Probit(P%) = NED(P%/100)

The Normal equivalent deviate may be familiar as the transformation that is used to produce "probability" graph paper.

   In probit analysis, we are interested in estimating the equation of that line. This can be done by perfoming an experiment in which there are several batches of subjects, each of which is given a different dose of the stimulus. The data then consists of a variate indicating the number of subjects that responded out of each batch, a variate to show the dose given to each batch, and a final variate for the total numbers of subjects in the batches; these are specified by parameters Y, DOSE and NBINOMIAL, respectively.

   The NBINOMIAL parameter can be omitted if the total numbers cannot be measured, as in some fumigation experiments ("Wadley's problem"; see for example Finney 1971, pages 202-8). The assumption is that the total numbers receiving the doses will come from the same Poisson distribution, and the mean of this distribution is then estimated in the analysis. Alternatively, NBINOMIAL can specify a factor to indicate groupings of the doses whose total numbers are expected to come from the same distributions.

   The PRINT option controls printed output:

    model
details of the model that has been fitted,

    summary
summary analysis-of-variance table,

    estimates
parameter estimates and standard errors,

    correlations
correlations between parameter estimates,

    fittedvalues
fitted values and residuals,

    monitoring
information about the fitting process, and

    effectivedoses
effective, or lethal, doses (see parameter LD below).

By default, PRINT=mode,summ,esti,fitt.

   The TRANSFORMATION option allows other transformations to be selected. Putting TRANSFORMATION=logit requests a logit transformation:

logit(P%) = log( P% / (100 - P%) )

This is very like the probit but approaches zero (to the left) and one (to the right) rather more slowly. The other possibility is the complementary log-log ( =log( -log(100-P%) ), which is relevant to the "one-hit" model (that is infection processes where just one infected particle is sufficient to cause the response).

   Sometimes, subjects may respond even in the absence of any dose. For example, with some short-lived insects, some would have died simply from natural causes during the period of the experiment. By setting option MORTALITY=estimate this natural mortality can be included in the model and estimated. Similarly, there may be subjects that will not respond, no matter how high the dose. Setting option IMMUNITY=estimate will include and estimate a parameter for natural immunity.

   It is also often of interest to fit study the way in which the model varies for different groups of subjects. For example, there may be groups of batches of subjects, each of which is given a different drug. The GROUPS option should then specify the group to which each batch of subjects belongs, and option SEPARATE indicates which parameters of the model (slope, mortality, and/or immunity) should have separate estimates. If SEPARATE is left at its default value, parallel lines will be fitted with identical values for any estimates of mortality and immunity.

   The LD option can request the estimation of one or more effective (or lethal) doses, specifying a scalar if there is just one, or a variate if there are several. The LOGBASE option is useful if the doses have been transformed to logarithms before calling PROBITANALYSIS. If you use LOGBASE to specify the base of the logarithms (ten or e), the backtransformed lethal doses will be printed as well.

   The DISPERSION option can be used to request use of a heterogeneity factor in the calculation of the standard errors of the slopes and lethal doses (see Finney 1971, pages 70-74). The standard assumptions for probit analysis are that the observations have binomial distributions in probit lines and planes, or Poisson distributions in Wadley's problem. Under these circumstances, the residual deviance will follow a Chi-square distribution. The residual deviance should on average be equal to its number of degrees of freedom. A significantly large value may indicate that there are other (possibly unknown) factors affecting the subjects, for example that the conditions were not uniform during the experiment. Alternatively it may occur because the subjects did not react independently, for example because there were sub-populations of genetically related individuals. If the large Chi-square seems to arise because the residuals are larger in general than expected (overdispersion) and not because of systematic deviations from the fitted relationship, it is sensible to increase the standard errors by a heterogeneity factor equal to the residual mean deviance. This can be requested by setting option DISPERSION=*. Alternatively DISPERSION can be set to a known value if one is available.

   When the FITMETHOD option is set to generalizednonlinear, the model is fitted as a generalized nonlinear model, using the FIT directive. The alternative setting, nonlinear, fits it as a nonlinear model using FITNONLINEAR. Apart from minor numerical differences, the two methods should generate the same results. Generalized nonlinear models allow a confidence region to be generated for lethal doses, and these are used as default for all situations except Wadley's problem. The nonlinear method is more accurate, and is thus used as the default for the more difficult situation presented by Wadley's problem.

   The final two parameters, INITIAL and STEPLENGTHS, allow initial values and steplengths to be specified for the optimization. For a generalized nonlinear model, the order of parameters is: total(s) for Wadley's problem (if appropriate), mortality parameters (if any) and immunity parameters (if any); the slopes and intercepts are fitted as regression parameters. For a nonlinear model, the order of parameters is: LD50(s), slope(s), mortality parameters (if any) and immunity parameters (if any); the totals for Wadley's problem, if required, as fitted as linear parameters. The MAXCYCLE option sets a limit on the number of iteractions used during fitting (default 30). Parameter estimates, fitted values, residuals, and so on, can be saved after running the procedure, by using the RKEEP directive in the usual way.

 

Options: PRINT, TRANSFORMATION, MORTALITY, IMMUNITY, GROUPS, SEPARATE, LD, LOGBASE, DISPERSION, FITMETHOD, MAXCYCLE.

Parameters: Y, DOSE, NBINOMIAL, INITIAL, STEPLENGTHS.


Method

For FITMETHOD=generalizednonlinear a calculated link is used to take account of any mortality or immunity parameters, and a calculated distribution to allow estimation of totals for Wadley's problem. The fitting is carried out by FIT (with the CALCULATION option set if any totals, mortality or immunity parameters are to be estimated), and procedure FIELLER is used to obtain LD values.

   For FITMETHOD=nonlinear initial values are obtained, if necessary, using the GenStat facilities for generalized linear models, ignoring any mortality or immunity. Expressions specifying the model are defined in sets of nested IF-blocks, taking account of the settings for example of TRANSFORMATION and GROUPS. The fitting is carried out by the FITNONLINEAR directive, and any extra LD values are estimated using RFUNCTION.


Action with RESTRICT

The Y variate, the DOSE variate, or the GROUPS factor can be restricted to indicate that the model is to be fitted only to a subset of the units.


Reference

Finney, D.J. (1971). Probit Analysis (third edition). Cambridge University Press, Cambridge.