To assess the how well empirical data approximates a particular theoretical
distribution, the sorted values (order statistics, X(i)) are plotted against
the expected values of the order statistics E(i) from the given distribution.
However, usually the particular parameters of the distribution are not known
and these have to be estimated first to obtain the expected values.
If the distribution has a cumulative density function of F(x), and the
inverse of this function is G(x) (i.e. G(F(x)) = x), then the expected values
of the order statistics, are approximately G((i-0.5)/n), where i = 1...n, and
n is the number of values in the sample. A plot of X(i) vs E(i) is known as
a Quantile-Quantile (or Q-Q) plot. The data can also be plotted on the
probability scale by plotting the cumulative probabilities of the data under
the assumed distribution against their expected probabilities, i.e. F(X(i)) vs
(i-0.5)/n. This is known as a Probability-Probability (or P-P) plot.
A third plot called the stabilized probability (SP) plot (Michael,
1983), was introduced, which rescales the probabilities using the
transformation sp = (2/pi)*arcsin(sqrt(p)), so that the variance of the
plotted points are approximately equal over the range of probability values.
In the SP plot the scaled values sp are plotted rather than the unscaled p
values.
The following graph shows a Normal Q-Q plot with 95% simultaneous
confidence bands and a 1-1 reference line.
Available Data
This lists variates that are available for analysis. Double-click on a name to
copy it to the Data values field; alternatively, you can type in the name
directly.
Data Values
This specifies the name of the variate that will be used in the probability
distribution plot.
Distribution
This provides a drop down list of the range of continues distributions that
the observed data can be plotted against.
Degrees of Freedom
Some of the distributions (Chi-square, t and F) cannot have the parameters
estimated by the usual distribution fitting facilities,
so these fields provide the degrees to specify the parameters of these distributions.
Box Cox Transform
If this is selected, a Box Cox transform will be performed on the data before
plotting it. The Box Cox transform for a variate X is defined as:
Y = (X**lambda - 1)/lambda if lambda is not equal to 0, and
Y = LOG(X) if lambda = 0.
The power - lambda is specified in the field provided.
If X does not have a normal distribution, a value of lambda can often be found such that Y is normally distributed.
For a Normal distribution, the Estimate button will use the
YTRANSFORM command to calculate the optimal value of lambda (to the nearest
0.1 between -4 and 4) to transform the X values to a Normal distribution. The
optimal value of lambda should be placed in the field above, unless the
server is busy with other calculations, in which case you will need to cut and
paste the value of lambda from the Output window when the server has completed
the calculation.
Plotting Scale
The graph can be plotted on three scales:
- Quantile - this plots a Q-Q plot of the observed data values plotted
against their expected quantiles
- Probability - this plots a P-P plot of the observed data values
transformed to a probability via the cumulative distribution function of the
theoretical distribution the plotted against their expected probabilities
- Stabilized Probability - this plots the stabilized probability plot of
Michael (1983) described above
Confidence Bands
This drop down list allows two forms of confidence intervals to be displayed
in the graph.
- Pointwise simulates distributions of the same size as the data,
from the theoretical distribution and plots the range of values at each value
of the order statistics that contain the proportion alpha (specified as a % in
the edit box Conf. Level) of simulated values. Thus a sample drawn from
the assumed distribution has approximately a probability alpha of lying within
the limits at each point. However, overall there will be a probability of less
than alpha that a sample will completely lie within the confidence bands.
- Simultaneous uses a statistic given by Michael, 1983, for which the
overall probability of plotted data lying completely within the confidence bands
is approximately the specified value of alpha, under the null hypothesis that
the data is a random iid sample from the specified distribution. This form of
confidence limits has the advantage that it is much faster to calculate and that
probability of the data points falling outside the limits is approximately
constant over the range of the data.
- None specifies no confidence bands to be drawn on the graph.
Action Buttons
| Run | Run the analysis. |
| Cancel | Close the menu without further changes. |
| Options | Opens a dialog where additional options and settings can be
specified for the analysis. |
| Defaults | Set the menu settings back to the default settings.
Clicking the right mouse on this button produces a shortcut menu where you can choose to set
the menu using the currently stored defaults or the GenStat default settings. |
| Store | Opens a dialog to specify names of structures to store the results from the analysis.
The names to save the structures should be supplied before running the analysis. |
See Also