Using GenStat with Microsoft Excel
See Also
GenStat contains a spreadsheet facility that can be used to view and manipulate data. The behaviour of a GenStat spreadsheet is not exactly the same as within other spreadsheet packages such as Microsoft Excel. However, GenStat spreadsheets contain many features similar to Excel, plus some additional enhancements for ease of use.

Similarities between GenStat and Excel

GenStat provides a spreadsheet that can be used to view and enter data cell by cell, with the ability to scroll around the cells, examining and changing the data. The shortcut keys for cutting (Ctrl+C) and pasting (Ctrl+V) data from one range of cells work in both GenStat and Excel. Many other shortcut keys are in common between the two applications such as Ctrl+A for selecting all cells, Ctrl+O for opening files, Ctrl+P for printing, Ctrl+S for saving and Ctrl+Z to undo and changes. An example Excel workbook and the corresponding GenStat spreadsheet are displayed below:

The GenStat spreadsheet provides similar functionality to Excel, but also includes some additional enhancements for ease of use. All the normal cursor and selection keys work as in Excel. The File and Edit menus have similar functionality in both applications. In GenStat all of the functionality for manipulating the data in a spreadsheet can be found in the Spread menu.

Within GenStat the rules governing the selection of columns or cells can be customized which can result in differences from the standard Excel behaviour. The cell or column selection can be customized using the Spreadsheet Column Options tab within the spreadsheet options. There are three options that essentially control the behaviour for selecting columns or cells: Select columns with a single click, Keep selected columns, and Clear selection on moving current cell.

Differences between Excel and GenStat spreadsheets

In Excel, the whole focus is on the workbook, and all results are displayed within the workbook. In GenStat the data is one aspect of the analysis, and so there are a range of other windows for output of results and graphs, and for specification of the analysis or editing of command scripts. Also in GenStat data does not need to be displayed within a spreadsheet, but can be directly imported into the statistical calculation engine.

Where is the GenStat Spreadsheet?

When GenStat starts up, unlike Excel, you do not have an empty spreadsheet open. However, you can always create an empty spreadsheet on start-up by selecting the New Spreadsheet option on the Options - Workspace dialog. To create an empty spreadsheet you can either select the New item on the File menu or alternatively select Create from the New item on the Spread menu. Selecting either of these menu items opens the File New - Spreadsheet dialog where you can specify the number of columns and rows to appear in the spreadsheet. If you want to import data stored within another package, you can select the Open item from the File menu, or transfer the data via the clipboard. GenStat has an additional facilities to help control how data are pasted to GenStat from the clipboard. To access these options when pasting from the clipboard select From Clipboard from the New item on the Spread menu (this opens the New Spreadsheet from Clipboard dialog).

Data Organisation

GenStat's spreadsheet is column based, in that all data on a particular measurement is normally stored in a single column. Each column has a particular type, either numerical (a variate), textual (a text) or categorical (a factor). You cannot mix text or numbers in a column. If you enter text in a numerical column you will get an error message, and if you enter a number on a text column, it will be treated as a piece of text. A factor has a limited set of values, known as levels, and any value entered into cell in a factor column must conform to one of the existing levels, or you will get a warning message. However, you can easily add a new item into the factor as a new level.

One of the main differences in data storage between GenStat and Excel is that GenStat has the FACTOR data type that is used to store group membership. If an Excel file contains categorical columns (numerical or text) these can easily be converted to factors within GenStat by selecting Convert to in the Factor item on the Spread menu. See understanding factors within a spreadsheet for more details on factors.

Within GenStat there is a restriction that all structures in a spreadsheet must have the same number of rows. These are the same requirements for a table in a database. Thus while you can enter any data anywhere in Excel, GenStat requires that the data matches the type of the column that is used, and if you want to add new rows or columns, you are resizing the data structures, and need to specify type information for any new columns.

An Excel workbook is an unstructured set of cells containing numbers, text and formulae in any order, whereas GenStat spreadsheet displays a set of specific data structures with a particular size and attributes. The unstructured nature of Excel can be considered as both a strength and weakness as the organisation is specific to the user, and not inherent in the spreadsheet.

Column Names in Excel and GenStat

Another result of GenStat's data structure orientation is that the column name is not just another standard cell that can be edited like all others as in Excel. To edit the structure name, move the cursor to the start of the column name (where the cursor should change to a pencil) and click the mouse. The structure names have the limitation that duplicate names cannot be used within the one spreadsheet, unlike Excel which allows this. If you attempt to enter a name already used, then a warning prompt will appear.

Column Types

The type of a column within GenStat must be either a variate (numbers only), text (strings) or a factor as explained above. In Excel, you can have a mixture of cell types within a column. Any column containing a text item will be read into GenStat as a text column (and any numbers within the column will be converted to the text string representing its value). You can specify how the a column from Excel is imported to a particular GenStat column type by using a type marker on the Excel cell containing the column name. This can be done in two ways:
  1. As a suffix on the column name
  2. As a comment on the column name cell
The following type markers can be added to the column name/cell comment: The ! marker can be applied in conjunction with the Date :D or Time :T markers, so that !:D would indicate a factor with levels displayed as dates.

Cell Comments

Comments on the cells containing the column names can be used to transfer extra information to GenStat. The comments can contain the following information:
  1. The factor labels or levels and their ordering
  2. The column type (!,#,$ etc as above)
  3. The column description
The column type (! for factor, # for variate or $ for text) and display format (:D etc) can be given at the start of the comment as described above.

Factor labels should be supplied in the usual GenStat form, starting with !T( with the entries separated by commas, and terminated with a closing bracket ')'. If any of the labels contain one or more of the characters '!', ',' or ')' - exclamation mark, comma or left bracket, then the label must be contained in single quotes. Note that if a label contains a single quote this must be doubled up. For example, the string (Pierre David's) should be supplied as '(Pierre David''s)'. If labels do not include any of the reserved characters then you do not need to use single quote's. For example, the following could be used !T(A,B,C,'(D-E)'). Any numbers within the list will be interpreted as a string provided the !T is given at the start of the label list.

Factor levels should start with !( where the numbers are separated by commas and terminated with a closing bracket ). If any of the entries in the list is not a valid number, then the list will be interpreted as the labels of the factor, equivalent to starting the list with !T(.

Note the type marker ! is redundant if the labels or levels are given.

The column extra description can be given as a plain string of text. This should not start with one of the reserved characters '!', '$', '#' or ':'.

You can combine the factor label/level specification and an extra description into a single comment. These must be specified in the following order:r: factor levels or labels, followed by a new line (use Shift+Enter to enter this into the comment in Excel), followed by the column type/display symbols, followed by the column description. The following gives an example of a factor being given labels, but with the levels in date format with a description:
!T('Spring','Summer','Autumn','Winter')
:D Seasonal Summary

If factor values are supplied as ordinals (1,2,3...), then levels or labels can be used to label these within GenStat. For example, a column called Sex could have the levels (1,2) and labels !T(Male,Female) applied to it. Also, if the data for the factor are specified as levels e.g. (10,20,40,100), then the comment can be used to apply labels to the levels, so a comment of !T('Half','Standard','Double','High') could give these labels to the 4 groups.

If the factor has levels or labels in the column data, the cell comment can be used to specify the order that these given in the factor, so that a comment of !T(Nil,Wild,AR1,AR37) can be used to change the label ordering from the default alphabetical ordering of AR1,AR37,Nil,Wild. Similarly if levels are provided with the column, the a redorering such as!(10,5,2,1,0.5) could be used to specify the levels should be stored in decreasing order rather than ascending order as by default.

An example of an Excel cell comment is shown below.

Size of the Spreadsheet

In general a GenStat spreadsheet's size is set to fit the required data for a given analysis. GenStat does not create the maximum sized sheet just in case you want to enter a large amount of data, as this can be inefficient. You need to specify the size of the spreadsheet you will need when you open it, and you can change this as below. When you save a file GenStat will remove empty rows and columns if the option Remove Empty Rows and Columns on Save is selected in the Spreadsheet File Options tab (opened by selecting the Spreadsheet Options item on the Tools menu).

The default size of an empty GenStat spreadsheet opened on startup is 10 columns by 100 rows. You can change the default size of a GenStat spreadsheet using the Options - Workspace dialog (opened by selecting Options on the Tools menu). If a new spreadsheet is opened using the File menu then you can specify the size of the spreadsheet within the File New - Spreadsheet dialog.

Each Excel spreadsheet is the same size, that is 256 columns by 65536 rows. Within GenStat a single GenStat spreadsheet can hold up to 16000 columns and an unlimited number of rows (provided you have enough memory on your computer). You can specify the maximum number of columns allowed within the File tab on the Spreadsheet Options.

What if the Excel Data is not Column based?

Sometimes data within Excel may be entered across rows or in several columns. GenStat provides several menus under the Manipulate item in the Spread menu to help reorganise data into the correct format for GenStat. These facilities include the Stack Spreadsheet which can be used to stack multiple columns into a single column. The Transpose menu can be used to transpose rows and columns. In addition to these there is a Paste Special option in the Edit menu that can be used reorganise data before pasting from the clipboard into a spreadsheets cells.

Multiple Pages

An Excel workbook can have up to 256 pages. A GenStat book can contain multiple spreadsheets, where the limit can be set in the spreadsheet options up to a maximum of 2000 sheets. The Book menu provides facilities for managing multiple sheets, including copying, moving, deleting or splitting sheets. The tabs for the sheets are displayed across the top of the window, whereas in Excel the tabs are displayed at the bottom. The tabs can be dragged into new positions, onto other books or into a book on their own (depending on where the tab is dropped).

Spreadsheet Types

Excel has three workbook page types: data, chart and Visual Basic script. GenStat spreadsheets do not contain graphics. However, they can contain an analysis script of GenStat commands that can be entered and run using the Spreadsheet Analysis menu. GenStat has six types of data spreadsheets, corresponding to the range of data structures found in GenStat. Apart from the standard variate/text/factor spreadsheets, there are three matrix types (rectangular, symmetric and diagonal), a table type that holds cross-tabulations (like the Excel pivot tables) and a scalar spreadsheet that holds single valued constants.

Calculations

Within GenStat calculations are performed on whole columns rather than cell by cell. As with Excel, you enter a formula to perform the calculation which can be done using the calculations menu.

GenStat's calculations are not dynamic like Excel's, so if you change any of the values within cells and want to recalculate dependent cell values, you need to use the recalculate menu to redo the calculations.

GenStat maintains two copies of the data, one visible in a spreadsheet, and the other hidden in the statistical calculation engine (the server). The server data names and types can be displayed at any time using the Data Display menu. By default the two copies of the data are synchronized whenever the user moves focus off the spreadsheet. You can change to manually update GenStat from a spreadsheet by turning off the Auto update GenStat option within the spreadsheet option. To manually synchronize the server data with the spreadsheet select Changed data to GenStat from the the Update item on the Spread menu, or alternatively press the F10 key. This feature allows you to have multiple sheets with data structures having the same name, whilst retaining only the copy of the latest updated spreadsheet within the server. Also this allows reorganisation of the data in the spreadsheet without disturbing on-going calculations in the server (which can run concurrently with interface operations).

Dates

In Excel, dates are stored in a cell as the number of days since the 1st January 1900. A date format is applied to this number, often automatically if it has been initially typed in a date format (e.g dd/mm/yyyy).

GenStat stores dates as the number of days from either 1st January 1900 or 1st March 1600. A date format can also be applied to a complete column (not cell by cell as in Excel). Date formats are not applied automatically in GenStat, but must manually be applied using the Date Formats dialog (opened from the Attributes/Format item on the Column option of the Spread menu), by selecting a format type of Date and then clicking the Date Format button.

If dates are copied from Excel to GenStat via the clipboard, they are transferred as text, resulting in a text column. However, if GenStat detects that the text may be a date format, it will prompt to convert the text to a date format. Text columns can be converted to a numerical value displayed as a date by selecting the Convert item on the Column option of the Spread menu. This opens the Date Formats dialog where you can select the new column type as a Variate and select the Read Text as Date option. The Date Format button to specify the particular date format used by Excel.

The text :D can be appended to a column name in Excel to identify a column as a date column. When this text is appended the column will be automatically interpreted as a date format whether it is pasted from the clipboard or is imported from an Excel file. Similarly the text :T can be used to identify a column as a time column, and the column will be automatically interpreted as a time format.

Comments in cell containing the column name can also be used to set a Date

Saving Results

Generally the output of statistical calculations is displayed within the Output window or within Graphics windows. However, you can save results of statistical analyses to a spreadsheet using the Save buttons on the Stats menus. The Display as Spreadsheet option on the save dialogs will create spreadsheets containing the results.

Enhancements over Excel

GenStat has a number of drag and drop facilities for moving and deleting rows or columns within books or spreadsheets. Also, GenStat's clipboard facility is more general than Excel's, not only providing a menu of options, but recycling data to fill a selection when pasting from the clipboard.

GenStat provides a wide range of menus for reorganising and manipulating data that are not available within Excel. These can be found under the Manipulate and Calculate items on the Spread menu.

Moving data between Excel and GenStat spreadsheets

The Excel Import Wizard allows data to be imported from an Excel file. Note that if the file is currently open in Excel it should be saved to make sure that the file on the disk is up to date. Data can also be pasted from the clipboard by selecting the from Clipboard item on the New option of the Spread menu, or can be pasted directly into an existing GenStat spreadsheet using the Edit menu items. See Transfer data between Windows and Spreadsheets for more information on copying data using the clipboard.

Tables

Excel has pivot tables, where a range of data is selected, and then columns are selected to provide cross-tabulation groups and other columns selected for data summaries (averages, counts etc.). The resulting table is embedded into a sheet in the workbook.

GenStat has menus for creating tables. The Frequency Tables menu produces tables of counts, and the Summary of Groups menu produces cross-tabulations of the usual data summaries for selected variate columns.

The results from GenStat are shown in a special Table spreadsheet. The layout of the table may be changed by selecting the Reorder Table item on the Manipulate submenu of the Spread menu or by dragging the columns into new positions. The Restrict/Filter item on the Spread can also be used to subset the rows displayed in the table. The Copy Special menus may be used to copy the table to Rich Text Format (RTF Table) or to Excel (Tabbed Text) via the clipboard.

Filters

Excel has menu items to filter the rows displayed in a spreadsheet. GenStat provides many facilities for filtering the rows of a spreadsheet. These are found under the Restrict/Filter item of the Spread menu. In GenStat terminology, a filter is also known as a restriction as it also restricts the rows used in any analysis, whereas a filter in Excel does not effect the cell calculations. Also in GenStat the filtered rows can be displayed in another colour using the Display Restricted Rows item on the Restrict/Filter menu. See the Spreadsheet Restrict/Filter Menu for more details.

Graphics

Graphs in Excel are created with the graphics wizard, and then either embedded in a data page, or added in separate a chart page in the workbook. These graphs will dynamically update if the data is changed.

GenStat has a Graphics Wizard, which takes you through the steps of creating a graph. However the resulting graph is not dynamic, and it is not stored in the spreadsheet. Graphs within GenStat are displayed in a separate graphics window. The graph can be saved in different formats and the display attributes (fonts, colours, symbols, line styles, etc.) can be edited by double-clicking on the graph and using the various edit options under the Tools menu in the GenStat Graphics Viewer.

Limits in transferring data

The maximum number of sheets or pages in a GenStat spreadsheet is a user defined limit between 20 and 4000, which can be set on the Spreadsheet Options, Books tab. GenStat when the specified limit of pages is reached, GenStat will start adding subsequent pages to a new book. The maximum number of pages in Excel is 16384 for Excel 2007 and 256 for prior versions.

The maximum number of columns in Excel is 16384 for Excel 2007 and 256 for prior versions. GenStat will not be able to save more than this number of columns to Excel. The maximum number of columns in a GenStat spreadsheet is a user defined limit between 250 and 16000, which can be set on the Spreadsheet Options, File tab. GenStat will not read in more columns from Excel than the specified limit.

The maximum number of rows in Excel is 1048576 for Excel 2007 and 65535 for Excel 97-2003 and 16384 for prior versions. GenStat will not be able to save more than this number of rows to Excel, depending on the file version being saved. The only limit to the number of rows in a GenStat spreadsheet is the amount of memory available on the computer.

The maximum factor label length in GenStat is 60 characters. Reading cells in a factor which contain text longer than this will result in the labels being truncated. The maximum text length in a cell in Excel is 65535, and in GenStat it is a user defined limit between 90 and 10000, which can be set on the Spreadsheet Options, Conversions tab. Cells read from Excel longer than the specified limit will be truncated.

See Also