AFFYMETRIX procedure
Estimates expression values for Affymetrix slides (D.B. Baird).
Options
Parameters
Description
AFFYMETRIX estimates expression values over the perfect match (PM) and mismatch (MM) pairs for each probe on Affymetrix slides (or chips). On Affymetrix chips, each probe has 8-20 pairs of DNA sequences with a central base changed between the perfect match and mismatch sequences. The value for the probe level of expression is taken as an average over the pairs of perfect match (PM) and mismatch (MM) spots. The intensity values are obtained by reading in a series of Affymetrix CEL files, and the chip information from a CDF file.
The METHOD option selects the method to use to summarize over the PM and MM pairs, with settings:
In the Affymetrix MAS 4 and 5 methods, the difference between the signals (PM - MM) is averaged using a robust averaging method. The MAS 4 algorithm uses the AvDiff algorithm which discards the minimum and maximum difference, and any differences greater than 3 standard deviations from the mean. The MAS 5 algorithm uses the Tukey biweight algorithm which reweights the values depending on how far they are from the median, and discards any that are more than 5 times the median absolute distance away. The MAS 5 algorithm also replaces the MM value with a value known as an Ideal Mismatch (IM), which is always less than the PM value.
The standard RMA algorithm would normally use the log2 transformed PM values with no background correction, which then have a quantile normalization applied to them. The adjusted PM values then have a Normal function transformation applied to them with the values for the transformation being calculated from a kernel density estimate applied to the adjusted PM values. Finally the transformed PM values are summarized with a median polish of the slides by atom values for each probe. The log2 transformation can be suppressed by setting option TRANSFORMATION=none.
The RMA model performs a background correction by fitting a two component model to the PM intensities:
Observed intensity = Signal + Noise
where Signal has an exponential distribution with parameter α (the reciprocal of the mean), the Noise has an Normal distribution with parameters μ (the mean) and σ (the standard deviation). α, μ and σ are then estimated and the expected value of the signal is estimated, given the observed value of the intensity.
For all algorithms, the lowest 2% of spots on each slide can be used to estimate a background correction for the intensities. The chip is divided into 16 zones in a 4 × 4 grid, and each spot has a weighted average of these 16 levels removed from it. The levels used are controlled by the BMETHOD options, with settings:
The BWEIGHTING option controls how the background levels are combined before removing them from each spot:
where Squared-distance = (distance from the spot to the zone centroid)2.
The quantile normalization of the PM/MM values on each slide is controlled by the NMETHOD option. Its settings select the way in which the overall distribution is produced from the cumulative density functions on each slide:
The intensity values are specified by the DATA parameter. If these are in a single variate, the SLIDE parameter should supply a factor to index the slides, and the PROBES parameter should supply a factor to index the probes (or genes). Alternatively you can supply a pointer containing a variate for each slide. The slides factor is then not required; if it is given it should just have one entry for each slide in the order of the variates in the pointer. The PROBES factor is that for a single slide, and all slides must have a common layout.
The ATOMS parameter supplies a factor to identify the PM/MM pairs within each probe, and the PMMM parameter supplies a factor, with levels labelled 'PM' and 'MM', to distinguish between PM and MM values. The TYPEPROBES parameter supplies a factor to specify the probe types. The types of probes that can occur on Affymetrix chips are: 'Expression', 'Genotyping', 'CustomSeq', 'Tag', 'Unknown', 'Checkerboard Negative', 'Checkerboard Positive', 'Hybridization Negative', 'Hybridization Positive', 'Text Negative', 'Text Positive', 'Central Negative', 'Central Positive', 'Gene Exp Negative', 'Gene Exp Positive', 'Cycle Fidelity Negative', 'Cycle Fidelity Positive', 'Central Cross Negative', 'Central Cross Positive', 'Cross Hyb Negative' and 'Cross Hyb Positive'.
The ROWS and COLUMNS parameters can supply factors to identify the rows and columns within each slide. These are required only if background corrections are to be made.
The ESTIMATES parameter must supply a variate to save the estimated expression value for each slide and probe combination. The IDPROBES and IDSLIDES parameters must supply factors to identify the probes and slides, respectively, in the ESTIMATES variate. You can also set parameter SPREADSHEET=results to save these in a GenStat spreadsheet. The SE parameter can supply a variate to save approximate standard errors and, if this is set, the standard errors are included in the spreadsheet.
Options: PRINT, METHOD, BMETHOD, BWEIGHTING, TRANSFORMATION, NMETHOD, REPLACEDATA, SPREADSHEET, MAXCYCLE, TOLERANCE.
Parameters: DATA, SLIDES, PROBES, ATOMS, PMMM, TYPEPROBES, ROWS, COLUMNS, ESTIMATES, SE, IDSLIDES, IDPROBES.
References
Irizarry, R.A., Hobbs, B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K.J., Scherf, U. & Speed, T.P. (2003). Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, 4, Number 2, 249-264.