RNEGBINOMIAL procedure
Fits a negative binomial generalized linear model estimating the aggregation parameter (R.M. Harbord & R.W. Payne).
Options
Parameter
Description
The negative binomial distribution can be fitted as a generalized linear model using FIT only for a given value of the aggregation parameter k. RNEGBINOMIAL extends the fitting to include estimation of k from the data.
The negative binomial distribution is a discrete distribution with the relationship between mean and variance given by
variance = mean + mean**2/k,
where k is a positive constant known as the aggregation parameter. It provides a possible model for count data that show apparent overdispersion when a Poisson model is fitted. (Another models is the simpler constant overdispersion model, obtained by setting option DISPERSION=* in a MODEL statement with option DISTRIBUTION=poisson; see McCullough & Nelder 1989 and Hinde & Demetrio 1998.)
The call to RNEGBINOMIAL must be preceded by a MODEL statement with option DISTRIBUTION=negativebinomial (otherwise an error message is printed). It is also necessary to specify the link function (e.g. by setting option LINK=logarithm for a log-link), as the default is the canonical log-ratio link, which is unlikely to be useful in practice (for example it requires the linear predictor to be negative).
The AGGREGATION option allows the estimate of k to be saved. The _2LOGLIKELIHOOD option allows minus twice the maximized log-likelihood to be saved. This may be useful for comparing a sequence of nested models fitted by RNEGBINOMIAL using likelihood ratio testing. (The deviance cannot be used to compare models unless the value of k is the same for all the models, as it is the difference between the log-likelihood of a given model and a saturated model with the same value of k.) Printed output is controlled by the PRINT option, which has the same settings as for the FIT directive but with the addition of aggregation to control the printing of the estimate of k and its standard error (based on observed rather than expected information; see Method), and loglikelihood to print minus two times the log-likelihood.
The CONSTANT, FACTORIAL, NOMESSAGE, FPROBABILITY, TPROBABILITY, and SELECTION options operate in the usual way (as for example in the FIT directive). The final two options, MAXCYCLE and TOLERANCE, can supply variates of length 2 that can be used to control the iterative process if required. The first element of MAXCYCLE sets the maximum number of times that the model is fitted as a generalized linear model for fixed k, while the second element sets the maximum number of Newton-Raphson iterations used to maximise the likelihood with respect to k for fixed fitted values. The alternating cycle stops when successive values of the deviance are within a tolerance set by the first element of the TOLERANCE option and successive values of the deviance are within a tolerance set by the second element.
Options: PRINT, AGGREGATION, _2LOGLIKELIHOOD, CONSTANT, FACTORIAL, NOMESSAGE, FPROBABILITY, TPROBABILITY, SELCTION, MAXCYCLE, TOLERANCE.
Parameter: TERMS.
Method
For fixed k, the negative binomial distribution is in the exponential family and the regression parameters determining the fitted values can be fitted as a generalized linear model using the FIT directive. For a fixed set of fitted values, k can be estimated by using the Newton-Raphson method to solve the score equation for k. Alternating between the two processes until convergence yields joint maximum likelihood estimates of k and the regression parameters. As the estimate of k is asymptotically independent of the other regression parameters (Lawless 1987), their standard errors can be obtained separately from the two processes. The standard error for k uses observed rather than expected information due to the use of Newton-Raphson rather than Fisher scoring.
The starting value of k is taken from the AGGREGATION option of the MODEL statement, which defaults to 1. This default appears to be a satisfactory initial value in practice, but the user may wish to specify a different value if convergence problems are encountered, or if speed is an issue and an approximate value of k is known.
Action with
RESTRICT
Any restriction applied to vectors used in the regression model applies also to the results from RNEGBINOMIAL.
References
McCullagh, P. & Nelder, J.A. (1989). Generalized Linear Models (second edition). Chapman & Hall, London.
Hinde, J. & Demetrio, C.G.B. (1998). Overdispersion: models and estimation. Computational Statistics & Data Analysis, 27, 151-170.
Lawless, J.F. (1987). Negative binomial and mixed Poisson regression. Canadian Journal of Statistics, 15, 209-225.