R0INFLATED procedure
Fits zero-inflated regression models to count data with excess zeros (D.A. Murray).
Options
Parameters
Description
R0INFLATED can be used to fit zero-inflated regression models to count data with excess zeros. The procedure allows the data to be modelled using two different approaches. The first possibility is to fit a zero-inflated Poisson regression model (ZIP) or a zero-inflated negative binomial regression model (ZINB) using an EM algorithm (Lambert 1992). In this analysis, the response variable of counts is assumed to be distributed as a mixture of a distribution (such as Poisson) and a degenerate distribution at zero. In these models, a generalized linear model (Poisson or negative binomial) with a log link is used for the count model, and a binomial model with logit link for the zero-inflation model. The alternative is to fit the conditional model of Welsh et al. (1996), which assumes that the data are in one of two states: a state where zeros are observed, or a state where counts are recorded. A binomial model with a logit link is used for the zero state, and truncated Poisson or truncated negative binomial model is used for the count state.
The response variable is supplied, in a variate, using the Y parameter. The XTERMS and ZTERMS options each specifies a formula, to describe the count model and the zero-inflation model respectively. The CONSTANT and ZCONSTANT options control whether a constant parameter is included in the count and zero-inflation models.
The METHOD option specifies the type of model to fit: the em setting fits the ZIP and ZINB mixture models, and the conditional setting fits the conditional model. The DISTRIBUTION option specifies the distribution for the count model. Note that a log link is always used for the count model.
The ESTIMATES and SE parameters save the parameter estimates and their standard errors. R0INFLATED puts them into variates, using the same order as in the display produced by the PRINT option. The standardized residuals and fitted values can be saved using the RESIDUALS and FITTEDVALUES parameters.
The RSAVE and ZSAVE parameters allow you to specify identifiers for the regression save structures for the count and zero-inflation states of the model. These structures store the final state of the regression models fitted. Note that the standard errors for the parameter estimates in the regression save structures will not be correct and should instead be obtained using the SE parameter or by the R0KEEP procedure.
For the Lambert models, the WEIGHTS option can specify a variate holding weights for each unit, and the OFFSET option allows you to include an offset (i.e. a variable in the regression model with a regression coeefficient fixed at one).
The PRINT option controls printed output, with settings:
The iterative process for the EM algorithm is controlled by the MAXCYCLE option which defines the maximum number of cycles, and the TOLERANCE option which sets convergence criteria. The EM algorithm cycle stops when successive values of the log-likelihood are within a tolerance set by the first element of the TOLERANCE option. The second and third elements of TOLERANCE control the convergence criterion for the aggregation parameter (k) for the negative binomial model and for the generalized linear model, respectively.
Options: PRINT, DISTRIBUTION, METHOD, CONSTANT, ZCONSTANT, XTERMS, ZTERMS, WEIGHTS, OFFSET, MAXCYCLE, TOLERANCE.
Parameters: Y, RESDIUALS, FITTEDVALUES, ESTIMATES, SE, RSAVE, ZSAVE.
Method
The zero-inflated Poisson regression model has the distribution
Pr(Y=y) = { w + (1 - w) exp(-lam) for y=0
= { (1 - w) exp(-lam) lamy / y! for y>0
where lam and w depend on covariates.
Similarly, the zero-inflated negative binomial regression model has the distribution
Pr(Y=y) = { w + (1 - w) × (1 + lam / k)-k for y=0
= { (1 - w) × Gamma(y + a) / (y! × Gamma(k))
× (1 + lam/ k)-k × (1 + k / lam)-y for y>0
where lam and w depend on covariates, and k≥0 is a scalar.
For both the Poisson and negative binomial the following models are assumed:
log(lam) = X b
and log(w/(1-w)) = G z
where X and G are covariate matrices and b and z are vectors of unknown parameters. The maximum likelihood estimates for b, z and k are then obtained using an EM algorithm (Lambert 1992).
The standard errors for the parameter estimates are derived using the incomplete data observed information matrix as proposed by Lambert (1992).
In the Poisson case of the conditional model, yi has a truncated Poisson distribution (lam(z)) with probability p(0)=0. So the probability model is
Pr(Y=0|x) = 1 - p(x)
Pr(Y=r|x,z) = p(x) × exp(-lam(z)) × lam(z) × r / r!
× (1 - exp(-lam(z))), for r=1, 2, ...
For the negative binomial case, yi has a truncated negative binomial (lam(z),k) with probability p(0)=0. So the probability model is
Pr(Y=0|x) = 1 - p(x)
Pr(Y=r|x,z) = p(x) × Gamma(r + 1/k)) / r! × Gamma(1/k)
× (k × lam(z))r × (1 + k × lam(z))-(r+1/k)
× (1 - (1 + k × lam(z))-1/k)-1, for r=1, 2, ...
where k is the extra-variation parameter in the untruncated negative binomial distribution.
For both conditional models the zero component is fitted using a logistic generalized linear model. The truncated Poisson model is fitted using an iteratively re-weighted least squares algorithm (see Welsh et al. 1996). The truncated negative binomial model is fitted using FITNONLINEAR.
Action with
RESTRICT
If a parameter is restricted the statistics will be calculated using only those units included in the restriction.
References
Lambert, D. (1992). Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics, 34, 1-14.
Ridout, M., Demetrio, C.G.B. & Hinde, J. (1998). Models for count data with many zeros. International Biometrics Conference, Cape Town.
Welsh, A.H., Cunningham, R.B., Donnelly, C.F. & Lindenmayer, D.B. (1996). Modelling the abundance of rare species: statistical models for counts with extra zeros. Ecological Modelling, 88, 297-308.