| Two Channel Microarray Example |
| See Also |
The data are stored in the file ApoAISlides.csv and is located within the Data folder of the GenStat installation. The location of the GenStat installation will depend on your installation, but usually is found under Program Files\Gen11ed on the C: drive. The file can be opened in GenStat by selecting Open from the File menu and then navigating to and selecting the file name.

When a CSV file is opened in GenStat you have the option of opening it into a text window, or into a spreadsheet. In this example the data are to be opened into a spreadsheet, this can be done by clicking on the Read button as shown below.

When opening CSV files you are prompted with two dialogs where additional options can be specified to control how the data are opened. The first dialog (as shown below) has options for controlling which rows of the data are to be opened. For this example the whole file is read by clicking on the OK button.

The second dialog contains further options for controlling how the data are to be opened including data type conversion and location of column names. For the example the default settings can be used by clicking on the OK button.

Opening the file should result in the following spreadsheet:

Within the spreadsheet the data have two columns for each slide. To analyse the data it needs to be in stacked format where all the red values are within one column and all the green values are stacked in another column. To reorganise the data the stack menu can be used. To open this dialog select Stack from the Manipulate section of the Spread menu. The menu below shows the settings for stacking the columns together. There are 16 columns that are being stacked together. All the green columns are first selected to the stacked list and then all the red columns. The name Slide has been entered for the factor to index the stacked columns. The column ID has been selected for the Repeat Columns list.

Note, the stacked columns can be renamed by double clicking the old name in the Stacked Columns name list and entering the new names in the rename dialog (see below).

Clicking OK on the stack menu should produce a spreadsheet as follows:

The labels of the factor Slide have been created using the original column names.
However, it may be preferable to change these labels to remove the 'G' to just display
c1...c8, k1...k8. The simplest way to do this is to select Edit Levels and Labels from the
Factor item on the Spread menu or by clicking on the
toolbar button.
This will open the dialog shown below, where the 'G' can be
removed from the labels by editing the appropriate cells.

Additional information on the genes and layout of the slides located within another file, ApoAIGeneNames.tab. The file can be opened using the Open item on the File menu, and should result in the following spreadsheet:

The information from the ApoAIGeneNames.tab data set needs to be merged into the stacked spreadsheet. To merge two spreadsheets click on the spreadsheet that the data is to be merged into (in this case the stacked spreadsheet). Then select Merge from the Manipulate item from the Spread menu which should open a dialog as shown below. The two spreadsheets are to be merged using the column ID to match columns between the spreadsheets.

The columns X1,X2,ROW, COL and NAME can be merged into the original spreadsheet by clicking on the Select Columns to Transfer button and then copying these names to the Selected Columns list.

The column X1 is the position of the pins across the slide, and X2 is the column position. These can be renamed to more the informative names Meta_Row and Meta_Col by clicking on the start of the column name (the cursor should change to a pencil when you hover at the start of the column name) and entering the new name. The columns which index the row and columns or the pins (Meta_Row and Meta_Col), the rows and columns within pins (ROW and COL) and the Gene Names (NAME) should all be converted to factors. To convert columns to factors click the right-mouse anywhere within the column to be converted and then select the Convert to Factor item on the pop-up menu. Once this has been done for each of the columns, the factor columns will be indicated by an exclamation mark at the start of the column name (see below).

The row and column positions across the whole slide are required for the analysis. These can be formed by using the factor product of Meta_Row with ROW and Meta_Col with COL respectively. To form the product of factors select Product/Combine from the Factor item on the Spread menu. This opens the dialog shown below, where the two factors Meta_Row and ROW have been selected, and the new name SRow has been entered for the product. Similarly, this dialog can be used to form the product of Meta_Col with COL with a new name SCol. Note that if the data are to be analysed using the normalization menu the factors Meta_Row and Meta_Col will need to be combined to form a factor Pin representing the pins.

To measure the level of differential expression between the two treatments on a slide the log-ratios can be calculated. To calculate the log-ratios select Log Ratios from the Calculate sub-menu from the Microarrays item on the Stats menu. The menu below shows the settings that can be used to calculate the log-ratios for this data set.

If the newly calculated log-ratio and intensity columns are not automatically added to your existing spreadsheet, you can append them by selecting the Data in GenStat item from Add on the Spread menu. In the corresponding dialog select the two columns to be added to the spreadsheet and click on the Add button.

The data on the slides can be explored by using the graphical menus available within the Explore sub-menu. For example, the Histograms item can be selected to produce histograms of the data. The following shows the settings for plotting a histogram of the log ratios by slide.

You can plot the histograms in a trellis layout using the options. To use the trellis layout click on the Options button and set the resulting dialog options as follows:


The spatial variation across the slides can be examined by selecting the Spatial Plot item from the Explore sub-menu. The following menu shows the settings that can be used to produce a spatial plot for each slide.

This is the image of the first slide:

Dye intensity and spatial effects (pins, rows and columns) can be removed from the slides by using the Normalization menu. To open the normalization menu select Two Channel from the Normalize item on the Microarrays menu. The menu below shows the settings that can be used to normalize the data. The factor Pin should have been created as the Factor Product of Meta_Row and Meta_Col as in the section above which shows to to combine Meta_Row with ROW to give a factor SRow indexing all the rows on a slide.

Note that the option to automatically include plots is available within the options for this menu. Clicking on the Options button will open a dialog and the option can be set as follows:

The resulting graphs display the effects that have been estimated:




To analyse the results across the 16 slides, a small data set is required which provides the treatments applied to the slides. To do this an empty spreadsheet with 3 columns and 16 rows can be created using the menu shown below. This menu can be opened by selecting New from the File menu and then selecting the Spreadsheet tab.

The data can easily be entered into the empty spreadsheet. The spreadsheet below shows the data that should be entered and also has named the three columns SlideName, Red_Treat, and Green_Treat. Note that the columns Red_Treat and Green_Treat have been converted to factors. The factors columns Green_Treat and Red_Treat must contain the same set of factor levels or labels. In this example the columns should be created such that they both have 3 labels (KnockOut, Normal and Reference).

To analyse the data select the Estimate Two Channel Effects item from Analyse on the Microarray menu. The picture below shows the resulting menu containing settings to run the analysis.

Results can be saved into data structures simultaneously with the analysis by clicking the Store button and specifying the names of the new structures. These structures can also be displayed into spreadsheets by selecting the Display in Spreadsheet.

To estimate the difference between the control and knock out treatments a contrast can be defined by clicking on the Contrasts button. This prompts for the contrast matrix name (KOvsN) and the number of contrasts (1). Clicking OK pop-ups a spreadsheet where the contrast matrix values can be supplied. The spreadsheet below shows a contrast for control versus knock out. Note the reference level is specified as 0 as it is not required in this contrast.

Note, a column will need to be added to the default matrix as the factor Red_Treat does not contain the label Reference which only occurs in the Green_Treat factor. A column can be added to a matrix by selecting Column from the Add item on the Spread menu.
For this example the Estimate dye bias from dye swaps option is not required and can be removed by making sure the option is not selected within options (click on the Options button to view the menu options). Clicking on the Run button will run the analysis and display the results in a spreadsheet.
It is useful to sort the spreadsheet by the contrast, which can be done by selecting Sort on the Spread menu. When sorted it can be seen that the APO gene has the largest level of differential expression (as below).

To adjust the estimated standard errors to each gene by the use of the information across all genes the empirical Bayes error estimation menu can be used. This shrinks the standard errors towards an estimated prior distribution, making the t values and probabilities more stable. To open this menu select Empirical Bayes Error Estimation from the Analyse section of the Microarrays menu. The following menu shows the settings that can be used to do this.

The false discovery rate can be examined by selecting False Discovery Rate from the Analyse item on the Microarrays menu. The menu below shows the settings that can be used along with the resulting graphs.



