MNORMALIZE procedure

Normalizes two-colour microarray data (D.B. Baird).


Options

PRINT = strings
What to print (summary, slidesummary, monitoring); default summ, slid, moni

PLOT = strings
What plots to produce (pineffects, roweffects, columneffects, intensityeffects, rowxcoleffects, ma, standardizedma, spatialresiduals); default * i.e. none

METHOD = string
What type of model components to fit (spline, loess); default spli

MODELTERMS = strings
What model components to fit (pins, rows, columns, intensity, pinxintensity, ar1, rowxcolumn, pinxrow, pinxcolumn); default pins, rows, colu, inte

DFINTENSITY = scalar
Degrees of freedom for intensity cubic spline; default 24

DFROWXCOLUMN = scalar
Degrees of freedom for row × col thinplate spline; default 49

POORFLAGS = text or variate
Levels of FLAGS that are poor quality spots

BADFLAGS = text or variate
Levels of FLAGS that are bad spots

ARRANGEMENT = string
Whether to use trellis or single plots (single, trellis); default trel

WINDOW = scalar
Window number for the graphs; default 3

DEVICE = scalar
Device number on which to plot the graphs

GRAPHICSFILE = text
What graphics filename template to use to save the graphs; default *


Parameters

LOGRATIOS = variates
Log-ratios

INTENSITIES = variates
Spot intensities

SLIDES = factors
Slides

PINS = factors
Pins

SROWS = factors
Rows across whole slide

SCOLUMNS = factors
Columns across whole slide

PROWS = factors
Rows within pins

PCOLUMNS = factors
Columns within pins

FLAGS = factors
Quality flags

CLOGRATIOS = variates
Save corrected log-ratios

SLOGRATIOS = variates
Save standardized log-ratios

SDSMOOTH = variates
Save smoothed deviations

PINEFFECTS = tables
Save estimated pin effects

ROWEFFECTS = tables
Save estimated row effects

COLEFFECTS = tables
Save estimated column effects

INTEFFECTS = variates
Save estimated intensity effects

CLRED = variates
Save corrected log2 red values

CLGREEN = variates
Save corrected log2 green values

VAREXPLAINED = variates
Save the variance explained by slide


Description

With large microarrays it is essential to identify sources of variation and correct for them, to allow for robust use of this technology. Through normalization procedures, such variations can be identified and removed to obtain data for follow-on research. The analysis of the microarrays is thus a two-step process: a within-slide analysis aimed at normalization and, if required, standardization; then a between-slide analysis to estimate the differences between targets (or treatments) and evaluate their consistency.

   Various techniques have been suggested for normalization, including linear regression, ratio statistics, local smoothing and analysis of variance. The approach in MNORMALIZE is to model the variation associated with spatial and structural components and remove this as noise. Examples of spatial components are the grid layout on the slide (rows × columns), and of structural components are the pins, print order and differential dye responses to binding and scanning. The model can be specified to fit the type of variation found in the particular series of slides. The usual statistical modelling approach is taken where all possible sources of noise are jointly fitted in one model, and the need for each term is assessed using the statistical significance of the reduction in the remaining unexplained variation. Model terms can be added or removed as required. The fitted model then indicates where useful modification of protocols and equipment would help minimize variation in future experiments.

   The type of model to use is selected using the METHOD option, with settings:

    spline
a mixed model including cubic smoothing splines, fitted with the REML directive; or

    loess
regression with the LOESS smoothing function, fitted with the FIT directive.

   The terms to include in the models are selected by the MODELTERMS option, with settings:

    pins
an effect for each pin on the slide;

    rows
an effect for each row on the slide;

    columns
an effect for each column on the slide;

    intensity
a cubic smoothing spline or Loess curve for spot intensity, with degrees of freedom defined by the DFINTENSITY option (default 24);

    pinxintensity
a different linear effects of intensity for each pin;

    ar1
autoregressive model with order 1, separately in row and column directions (REML only);

    rowxcolumn
a thin-plate spline (REML only) which fits a smooth surface with row and column interaction, with degrees of freedom defined by the DFROWXCOLUMN option (default 49);

    pinxrow
pin-by-row interaction; and

    pinxcolumn
pin-by-column interaction.

   The log-ratios and spot intensities are supplied by the LOGRATIOS and INTENSITIES parameters. The SLIDES parameter supplies a factor to index the slides, and PINS provides a factor to index the pins. The SROWS and SCOLUMNS parameters provide factors to index the rows and columns within the whole slide, while the PROWS and PCOLUMNS parameters provide factors to index the rows and columns within the pins. The FLAGS parameter supplies a factor giving a quality flag for each spot. The POORFLAGS and BADFLAGS options can then each supply a text or variate defining levels of FLAGS that are poor or bad quality spots. The poor spots are still used for model fitting, but are excluded from the output variates. The bad quality spots are excluded from any analysis.

   The CLOGRATIOS parameter can supply a variate to save the corrected log-ratios. Similarly, the SLOGRATIOS parameter can save the standardized log-ratios, and SDSMOOTH can save the smoothed deviations. The PINEFFECTS, ROWEFFECTS and COLEFFECTS parameters can save tables containing estimated pin, row and column effects, respectively. The INTEFFECTS parameter can save a variate containing estimated intensity effects. The CLRED and CLGREEN parameters can save variates containing corrected log2 red and green values, respectively. Finally, the VAREXPLAINED parameter can save a variate with the variance explained, by slide.

   The PRINT option controls printed output, and the PLOT option controls what graphs are produced. By default the plots for the slides are displayed in a trellis arrangement, but you can set option ARRANGEMENT=single to display them separately, in single plots. The WINDOW option specifies the window to use for the graphs (by default 3). You can use the DEVICE option to plot to a device other than the screen. The GRAPHICSFILE option then supplies a template for the file names.

 

Options: PRINT, PLOT, METHOD, MODELTERMS, DFINTENSITY, DFROWXCOLUMN, POORFLAGS, BADFLAGS, ARRANGEMENT, WINDOW, DEVICE, GRAPHICSFILE.

Parameters: LOGRATIOS, INTENSITIES, SLIDES, PINS, SROWS, SCOLUMNS, PROWS, PCOLUMNS, FLAGS, CLOGRATIOS, SLOGRATIOS, SDSMOOTH, PINEFFECTS, ROWEFFECTS, COLEFFECTS, INTEFFECTS, CLRED, CLGREEN, VAREXPLAINED.


Action with RESTRICT

Any restrictions on LOGRATIOS, INTENSITIES, SLIDES, PINS, SROWS, SCOLUMNS, PROWS, PCOLUMNS or FLAGS are removed (and a warning is given).