IMPORT procedure

Read data from a foreign file format and load it or convert it to a spreadsheet file, PC Windows only (D.B. Baird).


Options

PRINT = string
What information to print (catalogue); default cata

OUTTYPE = string
Output file type (GEN, GSH, XLS, TXT, SHEETS); default GSH.

METHOD = string
Whether to load data into the GenStat server after creating the file or whether merely to create the file (create, load); default load

IMETHOD = string
How identifiers are to be specified for the columns (read, supply, none); default read if COLUMNS is unset, supply otherwise

ENDSTATEMENT = string
Ending statement for a type GEN output file (RETURN, ENDBREAK); default RETURN

SPSSMV = string
What to do with SPSS missing value codes (ignore, convert); default conv

MISSING = string
What label represents missing values in Excel, Quattro or Lotus files; default '*'

FORDER = string
The order in which to define the labels or levels of a factor read in (sorted, unsorted); default sort

TEXTCONVERSION = string
How to convert text to numbers for the columns (strict, single, common, standard, lax); default stan

KEEPEMPTY = strings
Whether to retain any empty rows or columns found in the data (rows, columns, none); default none

NAMEROW = number
The row number within an Excel or Quattro spreadsheet which contains the column names (IMETHOD must be unset or set to read); default, the first row in CELLRANGE

EMETHOD = string
Whether to read in column descriptions/extra from Excel, SigmaPlot or Quattro spreadsheets (read, none); default none

EXTRAROW = number
The row number within an Excel or Quattro spreadsheet which contains the column descriptions (EMETHOD must be set to read); default, the second row in CELLRANGE

PREFIX = string
The string with which to prefix numerical column names; default '%'

TEMPMISSING = string
Whether to read in temporarily missing values as missing (yes, no); default no

INOPTIONS = string
Optional input file arguments to be passed to the DATALOAD.DLL

OUTOPTIONS = string
Optional output file arguments to be passed to the DATALOAD.DLL


Parameters

FILE = texts
Input file to be read

OUTFILE = texts
Name of the output file to be created; if this is not provided a temporary file will be created, and then deleted if the data is loaded

SHEETNAME = texts
Name of a spreadsheet worksheet or named range; default is the first sheet in spreadsheet

CELLRANGE = texts
Cell range within worksheet, giving the top left and bottom right cell in the format XXNN:XXNN where XX = A - IV, NN = 1 -64384; default * requests all data on the sheet

COLUMNS = texts
Names and/or type codes for the columns read (the type of column can be forced by ending the column name, if supplied, with the code ! for a factor, # for a variate, and $ for a text), using a name of '*' will cause a column to be dropped

ISAVE = texts or pointers
A pointer or the name of a pointer to save the column identifiers

START = texts
Contents of a cell in a spreadsheet file or a line in a text file from which to start reading

END = texts
Contents of a cell in a spreadsheet file or a line in a text file at which to end reading


Description

Data in the supported file formats will be extracted and saved in the specified file format, depending on the extension of OUTFILE. If this is not provided, the type is indicated by the OUTTYPE option, as either GEN (GenStat Command file), GSH (GenStat Spreadsheet), XLS (Excel 5 Spreadsheet) or TXT (ASCII Text file); the default is GSH. Setting OUTTYPE=SHEETS reads in the worksheet names (in upper case) in a spreadsheet file (Excel/Quattro/Simgaplot or SAS Transport) into a text named Worksheets. If PRINT=catalogue, then the worksheets names will be printed in the output window. The ENDSTATEMENT option specifies the ending statement type for a type GEN output file: either RETURN (the default) or ENDBREAK). You can set ENDSTATEMENT=* of you do not want an ending statement.

   If METHOD=load, the resulting file will be read in to GenStat data structures. In spreadsheet files (Excel, Quattro, 123, SigmaPlot), the SHEETNAME and CELLRANGE parameters can be used to read in just a specified section of the data in the file. The COLUMNS parameter can be used to set the names and types (see below) of the structures or to receive back a pointer to the structures created.

   In spreadsheet files, the data extracted are labels, numerical values and the results from formulae. A label of * in an otherwise numerical column is taken as a missing value, unless a different missing value marker is specified with the MISSING option. Empty cells are taken as missing values. Empty rows at the start, middle and end of a block are removed. Empty columns are also ignored by default.   You can set the option KEEPEMPTY=rows or KEEPEMPTY=columns to retain empty rows or columns respectively, or KEEPEMPTY=rows,columns to keep both. A single row of labels as the first non-empty row in the block will be taken as column names, unless the IMETHOD option is set. Empty cells in this row will have a default names C1, C2 etc. assigned. If IMETHOD=read and a column name cell contains a numerical value rather than a label, the column name will given as the numerical value prefixed with a % character. The prefix can be changed using the PREFIX option, so a column named '15' will be given a name of %15 by default, and if PREFIX='X', this would be X15.

   Using the COLUMNS parameter, the type of a column can be forced by providing a !, # or $ character on the end of the text items provided for the column names. A string '*' can be given as a name in COLUMNS to remove a column from the data read in. If only a single type character is given, the types of the columns and not their names can be changed. The extension :D on a column name specifies the values are to be read as dates. Similarly, when the column names are being read from a spreadsheet, their types can be given by using ! for a factor, # for a variate, $ for a text and :D for a date. The option FORDER controls the order that the labels or levels of a factors are placed in. If FORDER=sorted the levels will be in ascending numerical order and the labels in alphabetical order, while if FORDER=unsorted the levels and labels will be in the order in which they are first met in a column.

   The TEXTCONVERSION option controls how labels are converted to numbers in a column marked as a variate:

    strict
only labels that contain numeric data only are converted (e.g. '10' becomes 10; '1O' becomes *)

    single
a single character substitution is read as a number (o or O become 0; i, I, l or L become 1; s or S become 2; z or Z become 5; comma becomes decimal point)

    common
multiple substitutions as in single are made (e.g. 'Io' becomes 10; '23X' becomes *)

    standard
as in common but extra text is ignored at the end of the number (e.g. '23X' becomes 23; 'A2X3' becomes *)

    lax
any digits are read from the text (e.g. 'A2X3' becomes 23).

   The EMETHOD=read option can be set to read in a row of column descriptions/extra from a spreadsheet file. By default, this row is taken as the second row in CELLRANGE. The EXTRAROW option can be used to modify the row where the description is read from. The row number is relative to the start of the cell range, but if an neagtive row number is provided, the descriptions will be read from the absolute row in the spreadsheet corresponding to the absolute value of the row number provided. If EXTRAROW=1 then the column names will be read from the second row.

   The START parameter can supply a text to indicate where to start reading within a spreadsheet or text file. In a spreadsheet file (Excel, Quattro, Lotus) the cells from A1 will be searched row by row until a label is found that matches the text. Only cells below and to the right of this cell will then be imported. The text could thus be the name of the first variable to be read. Note that the text must not contain spaces or the division symbol (/). Similarly the END parameter can supply a text to indicate where to stop reading a spreadsheet or text file.

   The INOPTION and OUTOPTION options allow extra options to be passed to DATALOAD.DLL. For example setting INOPTION='/k' keeps leading and trailing and doubled blanks in strings, OUTOPTION='/u' creates undecorated names in a CSV file (i.e. 'Factor', rather than 'Factor!') OUTOPTION='/c' combines the three columns Red, Green and Blue in an BMP file into a single column RGB and INOPTION='/m' loads the data as a matrix rather than as separate columns.

   (Note: IMPORT replaces the procedure DATALOAD from earlier editions of GenStat.)

 

Options: PRINT, OUTTYPE, METHOD, IMETHOD, ENDSTATEMENT, SPSSMV, MISSING, FORDER, TEXTCONVERSION, NAMEROW, KEEPEMPTY, EMETHOD, EXTRAROW, PREFIX, TEMPMISSING, INOPTIONS, OUTOPTIONS.

Parameters: FILE, OUTFILE, SHEETNAME, CELLRANGE, COLUMNS, ISAVE, START, END.


Method

The request is passed to the DATALOAD.DLL library which reads the foreign file and returns any valid data found in a temporary GEN or GSH file. The following file types are supported: Excel 2-5,95,97,2000,XP,2003, Lotus WK1, Quattro (WQ1, WB*, QPW), dBase 2-5, Paradox 3-9, GenStat GSH, SAS PC 6.03-12, 7-9, SAS Transport, SAS JMP, Minitab 8-13, Statistica 5 and 6, Systat, MStat, Instat, Epi-Info, SPSS/Win, Gauss Data/Matrix (PC/Win/Unix), MatLab, S+ (PC/Unix), Stata 4-8, R data frames, Weka Attribute files, SigmaPlot 7-9, OSIRIS, Limdep, Comma delimited text files (*.CSV), ArcView/Info Shapefiles, MapInfo Exchange files, Windows Bitmap (*.BMP), Windows Sound (*.WAV) and NMR Binary files. The file type is worked out from the file contents, so the usual extension need not be used with the exception of the following file types which do not contain a unique signature: Epi-Info (.REC), S+ (.SDD) and Paradox (.DB). Any files not containing a unique file signature, but ending in these extensions will be classified as above. Any other file extensions will attempted to be read as a comma, space or tab delimited text file.

   There is a known problem that using the OUTTYPE=GEN inside a FOR loop (or another other procedure) ties up input channels until exiting the FOR loop. Thus it may exhaust the available input channels. Either use the OUTTYPE=GSH or set LOAD=no and write code to input the files created outside the loop (you will need to provide an output file name to do this).


Action with RESTRICT

Restrictions are not applicable to any of the parameters.