GEV Procedure |

See Also |

The Generalized Extreme Value distribution asymptotically models block maxima from any distribution with a stable maximum value distribution.

The modelling of maxima is used to answered questions about how often in the future values larger than a certain value may occur. By modelling the distribution, if the assumptions are true, then a smaller data series may be extrapolated to an event occurring more rarely than the length of the series. For example, if you want to estimate a 1 in 100 year event, but only have 40 years of data, you must extrapolate. This approach also provides some smoothing of the distribution so that the information is taken from all values, and not just the most extreme few events that have occurred in a short series. The value which is exceeded 1 in n years is known as the return level, and 1/n is the return probability.

The maxima are assumed to come from the maximum of a fixed number of observations, which are identically and independently distributed, and that the process generating the observations is stationary. Departures from these assumptions, for example that there is a trend or correlations over the series of observations or that the maxima are of different sized blocks can affect the validity of the analysis.

A maximum stable distribution must take one of three forms, the Gumbel (Extreme Value Type 1), Frechet (Extreme Value Type 2) or Weibull (Extreme Value Type 3) distributions. The main difference between the distributions is the behaviour of the upper tail. For the Weibull distribution, the maximum tail value is finite, whereas it is infinite for the other two distributions. The upper tail of the Gumbel distribution decays exponentially, whereas it decays polynomially for the Frechet. These three distributions can be combined into a single distribution, the Generalized Extreme Value distribution that has a shape parameter, eta, which specifies which of the three families the particular GEV distribution belongs to. If eta = 0, the distribution is Gumbel, if eta > 0, it is Frechet, and if eta < 0, Weibull. If plotted on a log-log scale, the Gumbel distribution will give a straight line, the Frechet will curve up, and the Weibull will tend to a horizontal asymptote.

The GEV distribution has three parameters, a location parameter, mu, a scale parameter, sigma, and the shape parameter, eta. The GEV cumulative distribution function (CDF) is given as:

G(x) = exp{-[1 + eta*(x - mu)/sigma]**(-1/eta)}, for {x: 1 + eta*(x - mu)/sigma > 0}.For eta = 0, the CDF above is undefined, but in the limit as eta tends to 0 this gives the usual Gumbel CDF of:

G(x) = exp{ -exp[(x - mu)/sigma]}, for any x.Minima may also be modelled by changing the sign of the variate.

When GROUPS and/or TREND are set, the estimated return probabilities and levels are estimated for the first level of the GROUP factor, and at the mean of the TREND variate. Also, any graphs are displayed using these standardized values. You can standardize the estimated values to a different value of TREND by setting the STANDARD option.

PRINT = *strings*

What to print (*model, estimates, tests, fittedvalues, monitoring, all*); default *model, estimates, tests*

PLOT = *strings*

What graphs to plot, (*qq, pp, etaprofile, density, returns, rprofile, all*); default *qq, returns, density*

ENVELOPE = *string*

Include confidence envelopes on plots (*yes, no*); default *yes*

CIMETHOD = *string*

How to calculate confidence limits (*exact, quadratic*); default *quadratic*

CDIRECTION = *string*

Direction of censoring (*left, right*), default *right* i.e. real value >= X.
Left censoring is where the true value is less than the given value X

ALPHA = *scalar*

Significance Level on confidence bands; default - 0.95

ETA = *scalar*

Value of ETA to force a GEV with this value; default *, estimate ETA by Maximum Likelihood

STANDARD = *scalar*

Value of the TREND to standardize return levels and probabilities to; default * - use the mean value of TREND

TITLE = *text*

Text containing the title for each plot (each row is a title)

WINDOW = *scalar*

What window to plot the graphs in; default - 3

DATA = *variate*

Variate containing Data to fit the GEV distribution to

GROUPS = *factor*

Factor giving groups with different means

TREND = *variate*

Variate giving explanatory term to fit a linear regression for the location parameter

CENSOR = *variate*

Variate giving 1 for censored values, 0 otherwise

ESTIMATES = *variate*

Estimated values for MU,SIGMA and ETA and TREND and GROUPS parameters

SE = *variate*

Standard errors for MU,SIGMA and ETA and TREND and GROUPS parameters

RETURNS = *variate*

If SET for input, the values to estimate the return probabilities for, otherwise
the estimated return periods for values in PROBABILITY

PROBABILITY = *variate*

If SET for input, the values to estimate the return levels for, otherwise the
estimated probabilities for values in RETURNS

LOWER = *variate*

Lower limit of ALPHA confidence interval for either the estimated values of
RETURNS or PROBABILITY depending which values have been input

UPPER = *variate*

Upper limit of ALPHA confidence interval for either the estimated values of RETURNS
or PROBABILITY depending which values have been input

LOGLIKELIHOOD = *loglikelihood*

Log-likelihood of fitted distribution

EXIT = *scalar*

The exit code from FITNONLINEAR
used to obtain the maximum likelihood estimates

Options: PRINT, PLOT, ENVELOPE, CIMETHOD, CDIRECTION, ALPHA, ETA, STANDARD, TITLE, WINDOW

Parameters: DATA, GROUPS, TREND, CENSOR, ESTIMATES, SE, RETURNS, PROBABILITY, LOWER, UPPER, LOGLIKELIHOOD, EXIT

The PRINT option controls what results are displayed from the analysis (model, estimates, tests, fittedvalues, monitoring). Setting PRINT=all will display all available results. The model setting gives the function for the CDF and details of the model being fitted, the estimates gives the estimated values for mu, sigma and eta, the tests setting gives a likelihood ratio test of eta being zero and a goodness of fit test for data following the GEV distribution, the fittedvalues setting a table of the observed return levels and the fitted values for each of these, and the monitoring setting gives a trace of the parameters and likelihood from any profile likelihoods being estimated (which can be quite slow).

The PLOT option controls what graphs are displayed from the analysis (qq, pp, etaprofile, density, returns, rprofile). Setting PLOT=all will generate all available graphs. The quantile-quantile (qq) and probability-probability (pp) graphs are produced using the DPROBABILITY directive. The etaprofile and rprofile settings give profile likelihood plots for eta and the return levels respectively. The density setting gives a combined histogram, dot plot and probability density function on the same graph. The returns setting gives a plot of the data and estimated return levels against the return period. The return period is the reciprocal of the return probability (an n year return period = a 1 in n year return probability). The TITLE option provides titles for the selected plots. Each line in the specified text structure provides a title for the plots. The WINDOW option specifies the FRAME to plot the graphs within.

The ENVELOPE option can be set to control whether confidence bands are include on the graphs. The ALPHA option controls the confidence limits used (default 95%) for the bands and for the estimated return levels, probabilities and profile likelihoods. The CIMETHOD option controls whether approximate or profilelikelihood confidence limits are estimated for the return levels. If profile likelihood limits are chosen, a profile likelihood curve for the return levels is calculated. Profile likelihood confidence limits can be very slow to calculate, but are more accurate. The PLOT=rprofile setting is only active if CIMETHOD=exact. Profile likelihood confidence limits are never available for estimated return probabilities.

The ETA option can be used to force a given value of eta to be used for the distribution. A common setting for this would be ETA=0 which forces the Gumbel distribution to be used.

Checking of stationarity for the values for either different groups or a trend with an associated variate can be done with the GROUPS and TREND parameters respectively. If the -2 times change in the log-likelihood is greater than a Chi-square deviate for the appropriate degrees of freedom (number of groups - 1, or 1 respectively, then the GROUPS or TREND parameters are significant. This calculation can be performed by saving the log-likelihoods for the respective fits with and without the term using the LOGLIKELIHOOD parameter, and then calculating the likelihood ratio test with the statement:

CALC ChiPr = CUCHISQUARE(-2*(loglik1 - loglik0);df)

The CENSOR parameter allows for data that have not been completely
observed, but a value for which the true value must be less than (left censoring)
or greater than (right censoring) has been observed. The CDIRECTION
option specifies the direction of censoring (Left or Right). Left or right censored
values are added to the likelihood using the CDF or 1-CDF respectively, rather using
the normal probability density function. If censoring is used, then various
of the plots produced are adjusted to allow for the censored values.
The EXIT parameter allows the exit code from FITNONLINEAR to be returned in a scalar so that you can check whether the estimation has converged. The estimated parameters and their standard errors can be saved in the structures set for ESTIMATES and SE respectively.

The RETURNS and PROBABILITIES parameters allow you to specify values that you want the corresponding return probabilities or levels estimated for, respectively. If the values in the variate for RETURNS are set, then the estimated return probabilities are given in PROBABILITIES and the lower and upper confidence limits for the return probabilities are in given in LOWER and UPPER respectively. If the values in the variate for PROBABILITIES are set, then the estimated return levels are given in RETURNS and the lower and upper confidence limits for the return levels are in given in LOWER and UPPER respectively.

- Coles, Stuart (2001). An introduction to statistical modelling of extreme values. Springer-Verlag: London.
- Reiss R and Thomas M (2001). Statistical Analysis of Extreme Values, 2nd edition. Birkhauser: Basel.

Treating all these wind speeds as a single group gives the following analysis and graphs.

SPLOAD '%GENDIR%\\EXAMPLES\\WindSpeed.GSH' "Fit a Generalized Extreme Value Distribution to Maxima" GEV [PRINT=model,estimates,tests; PLOT=qq,etaprofile,density,returns; \ CIMETHOD=quadratic; ENVELOPE=yes; ALPHA=0.95; WINDOW=3] \ DATA=mph; PROBABILITY=0.05 " Check for trend over time in maximum wind speeds " GEV [PRINT=model,estimates; PLOT=*;ETA=0] DATA=mph; TREND=Year " Check for difference in storm type " GEV [PRINT=model,estimates; PLOT=*] DATA=mph; GROUPS=Type " Estimate Non-Tropical storm mean, allowing for censoring due to tropical storms " GEV [PRINT=model,estimates; PLOT=*;CDIRECTION=Left] \ DATA=mph; CENSOR=Censor; PROBABILITY=0.05

Generalized Extreme Value Distribution: CDF(x) = EXP(-[1 + Eta*(x - Mu)/Sigma]**(-1/Eta)), Eta<>0 = EXP(-EXP(-(x - Mu)/Sigma), Eta==0 *** Estimates of GEV parameters *** estimate "s.e." Mu 43.38 2.056 Sigma 6.962 1.556 Eta 0.09417 0.2181 Maximum Log-Likelihood = -107.25 Maximum value of GEV Distribution is Infinite (Eta >= 0) Significance Test that Eta = 0 (ie mph follows a Gumbel distribution) Likelihood Ratio test statistic: 0.411 Chi-Square Probability of test: 0.5215 Goodness of Fit Test for mph following a GEV distribution --------------------------------------------------------- Critical values of test statistics (MARGINAL tests) --------------------------------------------------------- Significance level ------------------------------------- Test statistic 15% 10% 5% 2.5% 1% --------------------------------------------------------- Anderson-Darling 0.576 0.656 0.787 0.918 1.092 Cramer-von Mises 0.091 0.104 0.126 0.148 0.178 Watson 0.085 0.096 0.116 0.136 0.163 --------------------------------------------------------- --------------------------------------------------------------- Test statistic --------------------------------- Type of Anderson- Cramer- test Variate(s) Darling von Mises Watson --------------------------------------------------------------- Marginal 1 0.232 0.029 0.029 --------------------------------------------------------------- ?, *, ** indicate significance at 10%, 5% and 1% levels respectively 95.0 % Profile Likelihood Interval for Eta: ( -0.1739 0.4562 ) 95.0 % Approximate Intervals for Return Periods --------------------------------------------------------- Probability Return Period Level Lower Upper 0.05000 20.00 67.24 50.08 84.40From this analysis, there is no evidence of lack of fit to the GEV distribution (the Anderson-Darling, Cramer-von Mises and Watson tests statistics are all non significant), and a shape parameter of 0 (i.e. a Gumbel distribution) could be used (as the Likelihood ratio test for eta = 0 is non significant)

This graph plots the observed values against the expected values. There is a slight indication of some of the higher wind speeds (3 and 4 largest) lying above the expected line, but as these are all contained within the 95% confidence bands, the departure is minor. | |

The value of eta lies within the confidence limits (-0.17, 0.46). | |

The fitted density seems a reasonable fit to the histogram of wind speeds. | |

This plot can be used to read off return levels off the y-axis for any given return period (1 in n years, n given along the x-axis). |

Gumbel Extreme Value Distribution (GEV with Eta = 0): CDF(x) = EXP(-EXP(-(x - Mu)/Sigma) Fitting Trend term: Year *** Estimates of Gumbel parameters *** estimate "s.e." Mu(Intercept) 16.71 * Sigma 7.305 * Eta 0 FIXED Slope(Year) 0.01377 * Maximum Log-Likelihood = -107.45 Gumbel Extreme Value Distribution (GEV with Eta = 0): CDF(x) = EXP(-EXP(-(x - Mu)/Sigma) Fitting Groups term: Type *** Estimates of Gumbel parameters *** estimate "s.e." Mu(Tropical) 43.39 3.677 Sigma 7.221 1.499 Eta 0 FIXED Non-tropical 0.4697 4.218 Maximum Log-Likelihood = -107.44 Gumbel Extreme Value Distribution (GEV with Eta = 0): CDF(x) = EXP(-EXP(-(x - Mu)/Sigma) 8 data points are right censored *** Estimates of Gumbel parameters *** estimate "s.e." Mu 45.92 2.447 Sigma 8.555 2.039 Eta 0 FIXED Maximum Log-Likelihood = -85.65 95.0 % Approximate Intervals for Return Periods --------------------------------------------------------- Probability Return Period Level Lower Upper 0.05000 20.00 71.33 56.91 85.74

It can be seen that there is no significant effect of Year or Type (comparing the -2 times the change in log-likelihood with a Chi-square distribution with 1 degree of freedom (95% limit = 3.84)). The estimated 1 in 20 year return level for a non-tropical storm is 71.3 compared with 67.2 for all storms.

- Fit a Generalized Extreme Value Distribution
- GRGEV for generating random GEV deviates
- Fit a Generalized Pareto Distribution
- GPARETO