BOXPLOT procedure

Draws box-and-whisker diagrams or schematic plots (P.W. Lane & S.D. Langton).


Options

GRAPHICS = string
What type of graphics to use (highresolution, lineprinter); default high

TITLE = text
Title for diagram; default *

AXISTITLE = text
Title for axis representing data values; default *

WINDOW = scalar
Window in which to draw a high-resolution plot; default 4

SCREEN = string
Whether to clear screen before a high-resolution plot (clear, keep); default clea

ORIENTATION = string
Orientation of plots (down, across); default down

METHOD = string
Type of representation of data in a high-resolution plot (boxandwhisker, schematic); default boxa

BOXTITLE = text
Title for axis representing different variates or groups; default *

BOXWIDTH = string
Whether to relate box width to size of sample in high-resolution plot (fixed, variable); default fixe

WHISKER = number
Linestyle for whiskers (0...10); default 1

BAR% = scalar
Size of bar at the end of the whiskers, as a percentage of the box-width; default 0 (i.e. no bar)


Parameters

DATA = variates
Data to be summarized; no default

GROUPS = factor
Factor to divide values of a single variate into groups; default *

BOXLABELS = texts
Labels for individual boxes; default *, i.e. identifiers of variates or labels or levels of factor

UNITLABELS = texts
Labels for extreme points in schematic plot; default is to use unit labels


Description

BOXPLOT draws pictures to display the distribution of one or more sets of data. In the simplest case, with the DATA parameter set to a single variate, BOXPLOT will draw a box-and-whisker diagram, as defined by Tukey (1977). The box spans the interquartile range of the values in the variate, so that the middle 50% of the data lie within the box, with a line indicating the median. Whiskers extend beyond the ends of the box as far as the minimum and maximum values. If several variates are supplied, a box is drawn for each of them using the same scale. Alternatively, if a single variate is supplied by the DATA parameter, a factor with the same number of values as the variate may be provided by the GROUPS parameter, and a box will be drawn for each level of the factor.

   The GRAPHICS option indicates whether high-resolution or line-printer plots are required. The TITLE, AXISTITLE and BOXTITLE options can be set to specify the titles displayed at the top of the plot, along the axis representing the data values, and along the axis representing separate boxes when there are several variates or groups, for either graphics mode. For high-resolution plots, the WINDOW and SCREEN options control the placement of the picture in the graphical frame.

   The ORIENTATION option allows the boxes to be drawn down the page or across the page, though the former option cannot be selected for line-printer plots with more than 14 boxes. If the page size is small, as in interactive mode, line-printer plots with ORIENTATION=down are very cramped: the PAGE option of the OUTPUT directive can be used to increase the depth of the graphs.

   Schematic plots can be drawn (high-resolution only) by setting option METHOD=schematic. These diagrams (also defined by Tukey 1977) are modifications of box-and-whisker diagrams which display individual outlying points as well as the box. The whiskers extend only to the most extreme data values within the inner "fences", which are at a distance of 1.5 times the interquartile range beyond the quartiles, or the maximum value if that is smaller. Individual outliers are plotted with a cross by default, and labelled under control of the UNITLABELS parameter. "Far" outliers, beyond the outer "fences" which are at a distance of three times the interquartile range beyond the quartiles, are plotted with a different pen.

   By default, all boxes have equal width. High-resolution diagrams can be modified to indicate the number of values being represented by each box. The option BOXWIDTH=variable will scale the box widths by the square root of the number of values represented.

   The style of the whiskers can be controlled by setting the WHISKER option to a graphical linestyle in the range 0 to 10. These styles are device dependent, but 0 and 1 always give a solid line (the default) and 2 usually gives a dashed line. The BAR% option allows you to add bars at the end of the whiskers. For example, the setting 100 gives a bar as wide as the box, and 25 would give one a quarter the width. The default is 0, giving no bars.

   Four pens are used to draw the high-resolution displays, apart from the axes: Pen 1 for the boxes and median line (default colour black), Pen 2 for far outliers (red crosses), Pen 3 for outliers (green crosses) and Pen 4 for the whiskers (set to match the colour of Pen 1). You can customize the pictures by setting some aspects of these pens with the PEN directive before calling the procedure: in particular, the colours, symbols and line-thicknesses.

   The BOXLABELS parameter allows you to specify labels that will identify each box.

   The UNITLABELS parameter allows you to specify labels that will be used to identify outlying observations in schematic plots (but this is not available if you gave a list of variates in the DATA parameter).

 

Options: GRAPHICS, TITLE, AXISTITLE, WINDOW, SCREEN, ORIENTATION, METHOD, BOXWIDTH, BOXTITLE, WHISKER, BAR%.

Parameters: DATA, GROUPS, BOXLABELS, UNITLABELS.


Method

The medians and extremes are calculated by functions MEDIAN, MINIMUM and MAXIMUM, whereas the quartiles are calculated using the PERCENT option of TABULATE.


Action with RESTRICT

Restrictions on the supplied variates are taken into account. The grouping factor and texts holding boxlabels or unitlabels, if specified, should not be restricted.


Reference

Tukey, J.W. (1977). Exploratory Data Analysis. Addison-Wesley.