Microarray Calculate Affymetrix Expression Values
See Also
This menu can be used to estimate the expression values over the pairs of perfect match/mismatches for each probe on the slides/chips. On Affymetrix chips, each probe has 8-20 pairs of DNA sequences with a central base changed between the perfect match and mismatch sequences. The value for the probe level of expression is taken as an average over the pairs of perfect match (PM) and mismatch (MM) spots. The intensity values are obtained by reading in a series of Affymetrix CEL files, and the chip information from a CDF file. The size of data handled from CEL files can be very large, so it can be advantageous to use the options for batch processing the files when they are opened.

Method

The statistical method used to summarize over the PM/MM pairs. The methods available are:
RMA - Robust Means Analysis model The probe level model introduced by Irizarry et al. (2003) which only uses PM information and transforms the values based on a kernel density estimate of the PM distribution
RMA2 - Robust Means Analysis 2 An adaptation of RMA algorithm which fits the kernel density to a truncated distribution of the PM values, with the truncation point based on an initial kernel density estimate.
MAS4 - Affymetrix Version 4 The AvDiff algorithm introduced in the Affymetrix version 4 software
MAS5 - Affymetrix Version 5 The Tukey biweight algorithm introduced in the Affymetrix version 5 software

In the Affymetrix MAS 4 and 5 methods the difference between the signals, PM - MM is averaged using a robust averaging. The MAS 4 algorithm uses the AvDiff algorithm which discards the minimum and maximum difference, and any differences greater than 3 standard deviations from the mean. The MAS 5 algorithm uses the Tukey biweight algorithm which reweights the differences depending on how far they are from the median, and discards any differences which are more than 5 times the median absolute distance from the median. The MAS 5 algorithm also replaces the MM value with a value which is always less than the PM value, calculating what is known as an Ideal mismatch (IM) in this situation.

The standard RMA algorithm would normally use the log 2 transformed PM values with no background correction, which then have a quantile normalization applied to them. The adjusted PM values then have a normal function transformation applied to them with the values for the transformation being calculated from a kernel density estimate applied to the adjusted PM values. Finally the transformed PM values are summarised with a median polish of the slides by atom values for each probe.

The RMA model performs a background correction by fitting a two component model to the PM intensities, where the model is:

Observed Itensity = Signal + Noise

where Signal has an Exponential distribution with parameter alpha (the reciprocal of the mean), the Noise has a Normal distribution with parameters mu (the mean), and sigma (the standard deviation). Alpha, mu and sigma are then estimated and the expected value of the signal is estimated, given the observed value of the intensity.

Available Data

This lists data structures appropriate for the edit box which currently has focus. You can double-click a name to enter it in the edit box.

Data Format

The data can be supplied in either of the following formats:
The spreadsheet stack and unstack menus can be used to reorganise the data between these two formats.

Intensity

A variate containing the intensities to be analysed.

Use Log (base 2) Transformation of intensities

The Intensity variate is log 2 transformed before the analysis. The calculation for the transformation using GenStat command language is:

CALC LogY = LOG(Y)/LOG(2)

Slides

The factor that identifies the slides or chips.

Probes

The factor that identifies the probes or genes within each chip

Atoms

A factor which indexes the PM/MM pairs within each probe.

Type

A factor specifying the probe types. The Affymetrix chips use quality control probes, but these are not summarized and are discarded from the analysis. The types of probes that can occur on Affymetrix chips are:

Slide Rows

A factor specifying the row on the slide of each intensity. This is only required if the Background Correction option is selected in the Calculate Affymetrix Expression Values Options dialog.

Slide Columns

A factor specifying the column on the slide of each intensity. This is only required if the Background Correction option is selected in the Calculate Affymetrix Expression Values Options dialog.

Save

This section allow the structure to contain the results to be set. The Slide IDs, Probe IDs and Expression fields must be set, whilst saving the Approx Standard Error results is optional.
Slide IDsfactor indexes the slides in the resulting Expression variate
Probe IDsfactor indexes the probes in the resulting Expression variate
Expressionvariate stores the average expression for each slide and probe combination
Approx Standard Errorvariate approximate standard errors for the expressions for each slide and probe combination. Saving this standard errors involves many calculations and can slow down the time taken to run the analysis.

Display in Spreadsheet

When selected, the saved results will be displayed in a spreadsheet.

Action Buttons

RunRun the analysis.
CancelClose the menu without further changes.
OptionsOpens a dialog where additional options and settings can be specified for the analysis.
DefaultsSet the menu settings back to the default settings. Clicking the right mouse on this button produces a pop-up menu where you can choose to set the menu using the currently stored defaults or the GenStat default settings.

References

Affymetrix (1991). Affymetrix Microarray Suite User Guide. Affymetrix, Santa Clara, CA, version 4 edition.
Affymetrix (2001). Affymetrix Microarray Suite User Guide. Affymetrix, Santa Clara, CA, version 5 edition.
B.M. Bolstad, R.A. Irizarry, M. Astrand, and T.P. Speed. (2003). A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics, 19(2):185–193
Irizarry, R.A., Gautier, L, and Cope, L.M. (2003). The Analysis of Gene Expression Data: Methods and Software, chapter 4. Spriger Verlag.
Irizarry, R.A., Hobbs, B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K.J., Scherf, U., Speed, T.P. (2003). Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data. Biostatistics. Vol. 4, Number 2: 249-264.

Example

The following shows the menu set up to calculate the expression values for the 9 slides.

The options were set as follows:

See Also