| Using GenStat with Microsoft Excel |
| See Also |
The GenStat spreadsheet provides similar functionality to Excel, but also includes some additional enhancements for ease of use. All the normal cursor and selection keys work as in Excel. The File and Edit menus have similar functionality in both applications. In GenStat all of the functionality for manipulating the data in a spreadsheet can be found in the Spread menu.
Within GenStat the rules governing the selection of columns or cells can be customized which can result in differences from the standard Excel behaviour. The cell or column selection can be customized using the Spreadsheet Column Options tab within the spreadsheet options. There are three options that essentially control the behaviour for selecting columns or cells: Select columns with a single click, Keep selected columns, and Clear selection on moving current cell.
One of the main differences in data storage between GenStat and Excel is that GenStat has the factor data type that is used to store group membership. If an Excel file contains categorical columns (numerical or text) these can easily be converted to factors within GenStat by selecting Convert to in the Factor item on the Spread menu. See understanding factors within a spreadsheet for more details on factors.
Within GenStat there is a restriction that all structures in a spreadsheet must have the same number of rows. These are the same requirements for a table in a database. Thus while you can enter any data anywhere in Excel, GenStat requires that the data matches the type of the column that is used, and if you want to add new rows or columns, you are resizing the data structures, and need to specify type information for any new columns.
An Excel workbook is an unstructured set of cells containing numbers, text and formulae in any order, whereas GenStat spreadsheet displays a set of specific data structures with a particular size and attributes. The unstructured nature of Excel can be considered as both a strength and weakness as the organisation is specific to the user, and not inherent in the spreadsheet.
Factor labels should be supplied in the usual GenStat form, starting with !T( with the entries separated by commas, and terminated with a closing bracket ')'. If any of the labels contain one or more of the characters '!', ',' or ')' - exclamation mark, comma or right bracket, then the label must be contained in single quotes. Note that if a label contains a single quote this must be doubled up. For example, the string (Pierre David's) should be supplied as '(Pierre David''s)'. If labels do not include any of the reserved characters then you do not need to use single quote's. For example, the following could be used !T(A,B,C,'(D-E)'). Any numbers within the list will be interpreted as a string provided the !T is given at the start of the label list.
Factor levels should start with !( where the numbers are separated by commas and terminated with a closing bracket ). If any of the entries in the list is not a valid number, then the list will be interpreted as the labels of the factor, equivalent to starting the list with !T(.
Note the type marker ! is redundant if the labels or levels are given.
The column extra description can be given as a plain string of text. This should not start with one of the reserved characters '!', '$', '#' or ':'.
You can combine the factor label/level specification and an extra description into a single comment.
These must be specified in the following order:r: factor levels or labels,
followed by a new line (use Shift+Enter to enter this into the comment in Excel), followed
by the column type/display symbols, followed by the column description.
The following gives an example of a factor being given labels, but with the levels in date format
with a description:
!T('Spring','Summer','Autumn','Winter')
:D Seasonal Summary
If factor values are supplied as ordinals (1,2,3...), then levels or labels can be used to label these within GenStat. For example, a column called Sex could have the levels (1,2) and labels !T(Male,Female) applied to it. Also, if the data for the factor are specified as levels e.g. (10,20,40,100), then the comment can be used to apply labels to the levels, so a comment of !T('Half','Standard','Double','High') could give these labels to the 4 groups.
If the factor has levels or labels in the column data, the cell comment can be used to specify the order that these given in the factor, so that a comment of !T(Nil,Wild,AR1,AR37) can be used to change the label ordering from the default alphabetical ordering of AR1,AR37,Nil,Wild. Similarly if levels are provided with the column, the a redorering such as!(10,5,2,1,0.5) could be used to specify the levels should be stored in decreasing order rather than ascending order as by default.
An example of an Excel cell comment is shown below.
The default size of an empty GenStat spreadsheet opened on startup is 10 columns by 100 rows. You can change the default size of a GenStat spreadsheet using the Options - Workspace dialog (opened by selecting Options on the Tools menu). If a new spreadsheet is opened using the File menu then you can specify the size of the spreadsheet within the File New - Spreadsheet dialog.
Each Excel spreadsheet is the same size, that is 256 columns by 65536 rows. Within GenStat a single GenStat spreadsheet can hold up to 16000 columns and an unlimited number of rows (provided you have enough memory on your computer). You can specify the maximum number of columns allowed within the File tab on the Spreadsheet Options.
GenStat's calculations are not dynamic like Excel's, so if you change any of the values within cells and want to recalculate dependent cell values, you need to use the recalculate menu to redo the calculations.
GenStat maintains two copies of the data, one visible in a spreadsheet, and the other hidden in the statistical calculation engine (the server). The server data names and types can be displayed at any time using the Data Display menu. By default the two copies of the data are synchronized whenever the user moves focus off the spreadsheet. You can change to manually update GenStat from a spreadsheet by turning off the Auto update GenStat option within the spreadsheet option. To manually synchronize the server data with the spreadsheet select Changed data to GenStat from the the Update item on the Spread menu, or alternatively press the F10 key. This feature allows you to have multiple sheets with data structures having the same name, whilst retaining only the copy of the latest updated spreadsheet within the server. Also this allows reorganisation of the data in the spreadsheet without disturbing on-going calculations in the server (which can run concurrently with interface operations).
GenStat stores dates as the number of days from either 1st January 1900 or 1st March 1600. A date format can also be applied to a complete column (not cell by cell as in Excel). Date formats are not applied automatically in GenStat, but must manually be applied using the Date Formats dialog (opened from the Attributes/Format item on the Column option of the Spread menu), by selecting a format type of Date and then clicking the Date Format button.
If dates are copied from Excel to GenStat via the clipboard, they are transferred as text, resulting in a text column. However, if GenStat detects that the text may be a date format, it will prompt to convert the text to a date format. Text columns can be converted to a numerical value displayed as a date by selecting the Convert item on the Column option of the Spread menu. This opens the Date Formats dialog where you can select the new column type as a Variate and select the Read Text as Date option. The Date Format button to specify the particular date format used by Excel.
The text :D can be appended to a column name in Excel to identify a column as a date column. When this text is appended the column will be automatically interpreted as a date format whether it is pasted from the clipboard or is imported from an Excel file. Similarly the text :T can be used to identify a column as a time column, and the column will be automatically interpreted as a time format.
Comments in cell containing the column name can also be used to set a Date
GenStat provides a wide range of menus for reorganising and manipulating data that are not available within Excel. These can be found under the Manipulate and Calculate items on the Spread menu.
GenStat has menus for creating tables. The Frequency Tables menu produces tables of counts, and the Summary of Groups menu produces cross-tabulations of the usual data summaries for selected variate columns.
The results from GenStat are shown in a special table spreadsheet. The layout of the table may be changed by selecting the Reorder Table item on the Manipulate submenu of the Spread menu or by dragging the columns into new positions. The Restrict/Filter item on the Spread can also be used to subset the rows displayed in the table. The Copy Special menus may be used to copy the table to Rich Text Format (RTF Table) or to Excel (Tabbed Text) via the clipboard.
GenStat has a Graphics Wizard, which takes you through the steps of creating a graph. However the resulting graph is not dynamic, and it is not stored in the spreadsheet. Graphs within GenStat are displayed in a separate graphics window. The graph can be saved in different formats and the display attributes (fonts, colours, symbols, line styles, etc.) can be edited by double-clicking on the graph and using the various edit options under the Tools menu in the GenStat Graphics Viewer.
The maximum number of columns in Excel is 16384 for Excel 2007 and 256 for prior versions. GenStat will not be able to save more than this number of columns to Excel. The maximum number of columns in a GenStat spreadsheet is a user defined limit between 250 and 16000, which can be set on the Spreadsheet Options, File tab. GenStat will not read in more columns from Excel than the specified limit.
The maximum number of rows in Excel is 1048576 for Excel 2007 and 65535 for Excel 97-2003 and 16384 for prior versions. GenStat will not be able to save more than this number of rows to Excel, depending on the file version being saved. The only limit to the number of rows in a GenStat spreadsheet is the amount of memory available on the computer.
The maximum factor label length in GenStat is 60 characters. Reading cells in a factor which contain text longer than this will result in the labels being truncated. The maximum text length in a cell in Excel is 65535, and in GenStat it is a user defined limit between 90 and 10000, which can be set on the Spreadsheet Options, Conversions tab. Any cells read from Excel that have text longer than the specified limit will be truncated.