For large arrays it is essential to identify sources of variation and correct
for them to allow for robust use of this technology. Through normalization procedures,
such variations can be identified and removed to obtain data for follow on research.
The analysis of the microarrays, is a two-step analysis; with a within slide
analysis aimed at normalization and if required standardisation, and then a
between slide analysis to estimate the differences between targets, and their
consistency. Various techniques for normalisation have been suggested, including
linear regression, ratio statistics, local smoothing and analysis of variance.
The approach used in this menu is to model the variation associated with
spatial and structural components and remove this as noise. Examples of
spatial components are the grid layout on the slide (rows x columns), and of
structural components are the pins, print order and differential dye responses
to binding and scanning. The model can be specified to fit the type of variation
found in the particular series of slides. The usual statistical modelling approach
is taken where all possible sources of noise are jointly fitted in one model,
with the need for each term being assessed using statistical significance of the
reduction in remaining unexplained variation. Model terms can be added or removed
as required. The fitted model then indicates where useful modification of protocols
and equipment would help minimise variation in future experiments.
The sorts of spatial artifacts removed from a slide can be seen in
a spatial plot.
Available Data
This lists data structures appropriate for the edit box which
currently has focus. You can double-click a name to enter it in the
edit box.
Model
Two types of model can be used to normalize the data:
- Spline using REML - Mixed Model using cubic smoothing splines fitted with the REML directive.
- Loess using Fit - Regression using the LOESS smoothing function.
Terms
The model terms to fit to the log-ratios. Select the model that you want to fit from the drop down list.
The terms are made up of the following components:
- Pins - A separate mean for each pin on the slide.
- Rows - A separate mean for each row on the slide.
- Columns - A separate mean for each column on the slide.
- Intensity - A cubic smoothing spline or Loess curve (maximum degrees of freedom set in the options menu) for spot intensity.
- AR1 - an autoregressive model with order 1, separately in row and columns (REML only)
- Spline(Row.Column) - a thin-plate spline which fits a smooth surface with row and column interaction (REML only)
The selection of terms will enable or disable the fields below required to fit the model.
For the AR1 term, the within Pin rows and columns are required. For the Row and Column terms,
the whole slide rows and columns terms are required respectively.
Log-ratios
The log-ratios (generally calculated using the
calculate microarray log-ratios
menu) to normalize.
Intensity
The brightness of the spot on the log scale, usually calculated with the
calculate microarray log-ratios
menu. If the Intensity term is not being fitted, this does not need to be provided.
Slides
The factor that identifies the slides. If just a single slide is being normalised, this
does not needed to be provided.
Pins
A factor that indexes the print groups or pins that printed the spot within each slide.
Pins may deliver different aliquots of DNA when printing the spots, or may become
blocked and so change the level of the log-ratio for that group of spots on the slide.
As pins generally print adjacent spots, pin effects will also be confounded with other
general spatial effects. If the Pins term is not being fitted, this does not need to be provided.
Slide Rows
A factor that indexes the rows across the whole slide. Spatial effects may cause variations
along the rows. If the Rows term is not being fitted, this does not need to be provided.
Slide Columns
A factor that indexes the columns across the whole slide. Spatial effects may cause variations
along the columns. If the Columns term is not being fitted, this does not need to be provided.
Within Pin Rows
A factor that indexes the rows within the pins. The rows with in each pin should
be numbered from 1 to n. This is required to efficiently fit the AR1 autocorrelation
along rows. If the AR1 term is not being fitted, this does not need to be provided.
Within Pin Columns
A factor that indexes the columns within the pins. The columns with in each pin should
be numbered from 1 to n. This is required to efficiently fit the AR1 autocorrelation
along columns. If the AR1 term is not being fitted, this does not need to be provided.
Quality Flags
The name of the variate or factor that specifies spot quality. Many image analysis systems
create a code for the quality of the spots. For example, GenePix creates a variate named
Flags that has values -25, and -50 for low intensity spots which it regards a poor quality
and values of -75 and -100 for spots that have bad quality due scratches or other image artifacts
or intervention by the user to mark them as bad.
Poor Flags
The codes in the Quality Flags structure that indicate poor quality spots. Poor
quality spot information is used for the normalization process, but the corrected log ratios
are returned as missing. For example, GenePix uses values -25, and -50 to mark poor quality spots.
Bad Flags
The codes in the Quality Flags structure that indicate bad quality spots. Bad
quality spot information is not used for the normalization process and the corrected log ratios
are returned as missing. For example, GenePix uses values -75, and -100 to mark poor quality spots.
Action Buttons
| Run | Run the analysis. |
| Cancel | Close the menu without further changes. |
| Options | Opens a dialog where additional options and settings can be
specified for the analysis. |
| Defaults | Set the menu settings back to the default settings.
Clicking the right mouse on this button produces a shortcut menu where you can choose to set
the menu using the currently stored defaults or the GenStat default settings. |
| Store | Opens a dialog to specify names of structures to store the results from the analysis.
The names to save the structures should be supplied before running the analysis. |
Example
The following example shows the Normalization of a mouse knock out experiment
with 6384 genes per slide. There were 16 slides in this experiment, 8 control mice
and 8 knock out mice all on the red dye compared to a standard reference on the green dye.
The normalization fits pin, row and column effects and dye intensity effects.
The estimated effects for these can be seen in the following plots produced by
the procedure. The menu settings used to normalize this data is shown here:
The options to set the spline degrees of freedom and the plots/display to
produce where set in the Options dialog. The graphs where consolidated the
effects graphs onto a single page by using a trellis plot.
Just the corrected log-ratios were saved and added back to the originating spreadsheet.
The estimated effects of the various model components are displayed for all slides
in the following Trellis plots. It can be seen that there are large effects for most
of the components. The normalization model explains over 50% of the variation on the slides.
The column effects on most slides are small, with the exception of the first control
and knock out slides c1 and k1 where there is a strong trend on one side of the slides.
See Also