BOXPLOT procedure
Draws box-and-whisker diagrams or schematic plots (P.W. Lane & S.D. Langton).
Options
Parameters
Description
BOXPLOT draws pictures to display the distribution of one or more sets of data. In the simplest case, with the DATA parameter set to a single variate, BOXPLOT will draw a box-and-whisker diagram, as defined by Tukey (1977). The box spans the interquartile range of the values in the variate, so that the middle 50% of the data lie within the box, with a line indicating the median. Whiskers extend beyond the ends of the box as far as the minimum and maximum values. If several variates are supplied, a box is drawn for each of them using the same scale. Alternatively, if a single variate is supplied by the DATA parameter, a factor with the same number of values as the variate may be provided by the GROUPS parameter, and a box will be drawn for each level of the factor.
The GRAPHICS option indicates whether high-resolution or line-printer plots are required. The TITLE, AXISTITLE and BOXTITLE options can be set to specify the titles displayed at the top of the plot, along the axis representing the data values, and along the axis representing separate boxes when there are several variates or groups, for either graphics mode. For high-resolution plots, the WINDOW and SCREEN options control the placement of the picture in the graphical frame.
The ORIENTATION option allows the boxes to be drawn down the page or across the page, though the former option cannot be selected for line-printer plots with more than 14 boxes. If the page size is small, as in interactive mode, line-printer plots with ORIENTATION=down are very cramped: the PAGE option of the OUTPUT directive can be used to increase the depth of the graphs.
Schematic plots can be drawn (high-resolution only) by setting option METHOD=schematic. These diagrams (also defined by Tukey 1977) are modifications of box-and-whisker diagrams which display individual outlying points as well as the box. The whiskers extend only to the most extreme data values within the inner "fences", which are at a distance of 1.5 times the interquartile range beyond the quartiles, or the maximum value if that is smaller. Individual outliers are plotted with a cross by default, and labelled under control of the UNITLABELS parameter. "Far" outliers, beyond the outer "fences" which are at a distance of three times the interquartile range beyond the quartiles, are plotted with a different pen.
By default, all boxes have equal width. High-resolution diagrams can be modified to indicate the number of values being represented by each box. The option BOXWIDTH=variable will scale the box widths by the square root of the number of values represented.
The style of the whiskers can be controlled by setting the WHISKER option to a graphical linestyle in the range 0 to 10. These styles are device dependent, but 0 and 1 always give a solid line (the default) and 2 usually gives a dashed line. The BAR% option allows you to add bars at the end of the whiskers. For example, the setting 100 gives a bar as wide as the box, and 25 would give one a quarter the width. The default is 0, giving no bars.
Four pens are used to draw the high-resolution displays, apart from the axes: Pen 1 for the boxes and median line (default colour black), Pen 2 for far outliers (red crosses), Pen 3 for outliers (green crosses) and Pen 4 for the whiskers (set to match the colour of Pen 1). You can customize the pictures by setting some aspects of these pens with the PEN directive before calling the procedure: in particular, the colours, symbols and line-thicknesses.
The BOXLABELS parameter allows you to specify labels that will identify each box.
The UNITLABELS parameter allows you to specify labels that will be used to identify outlying observations in schematic plots (but this is not available if you gave a list of variates in the DATA parameter).
Options: GRAPHICS, TITLE, AXISTITLE, WINDOW, SCREEN, ORIENTATION, METHOD, BOXWIDTH, BOXTITLE, WHISKER, BAR%.
Parameters: DATA, GROUPS, BOXLABELS, UNITLABELS.
Method
The medians and extremes are calculated by functions MEDIAN, MINIMUM and MAXIMUM, whereas the quartiles are calculated using the PERCENT option of TABULATE.
Action with
RESTRICT
Restrictions on the supplied variates are taken into account. The grouping factor and texts holding boxlabels or unitlabels, if specified, should not be restricted.
Reference
Tukey, J.W. (1977). Exploratory Data Analysis. Addison-Wesley.