IMPORT procedure
Read data from a foreign file format and load it or convert it to a spreadsheet file, PC Windows only (D.B. Baird).
Options
Parameters
Description
Data in the supported file formats will be extracted and saved in the specified file format, depending on the extension of OUTFILE. If this is not provided, the type is indicated by the OUTTYPE option, as either GEN (GenStat Command file), GSH (GenStat Spreadsheet), XLS (Excel 5 Spreadsheet) or TXT (ASCII Text file); the default is GSH. Setting OUTTYPE=SHEETS reads in the worksheet names (in upper case) in a spreadsheet file (Excel/Quattro/Simgaplot or SAS Transport) into a text named Worksheets. If PRINT=catalogue, then the worksheets names will be printed in the output window. The ENDSTATEMENT option specifies the ending statement type for a type GEN output file: either RETURN (the default) or ENDBREAK). You can set ENDSTATEMENT=* of you do not want an ending statement.
If METHOD=load, the resulting file will be read in to GenStat data structures. In spreadsheet files (Excel, Quattro, 123, SigmaPlot), the SHEETNAME and CELLRANGE parameters can be used to read in just a specified section of the data in the file. The COLUMNS parameter can be used to set the names and types (see below) of the structures or to receive back a pointer to the structures created.
In spreadsheet files, the data extracted are labels, numerical values and the results from formulae. A label of * in an otherwise numerical column is taken as a missing value, unless a different missing value marker is specified with the MISSING option. Empty cells are taken as missing values. Empty rows at the start, middle and end of a block are removed. Empty columns are also ignored by default. You can set the option KEEPEMPTY=rows or KEEPEMPTY=columns to retain empty rows or columns respectively, or KEEPEMPTY=rows,columns to keep both. A single row of labels as the first non-empty row in the block will be taken as column names, unless the IMETHOD option is set. Empty cells in this row will have a default names C1, C2 etc. assigned. If IMETHOD=read and a column name cell contains a numerical value rather than a label, the column name will given as the numerical value prefixed with a % character. The prefix can be changed using the PREFIX option, so a column named '15' will be given a name of %15 by default, and if PREFIX='X', this would be X15.
Using the COLUMNS parameter, the type of a column can be forced by providing a !, # or $ character on the end of the text items provided for the column names. A string '*' can be given as a name in COLUMNS to remove a column from the data read in. If only a single type character is given, the types of the columns and not their names can be changed. The extension :D on a column name specifies the values are to be read as dates. Similarly, when the column names are being read from a spreadsheet, their types can be given by using ! for a factor, # for a variate, $ for a text and :D for a date. The option FORDER controls the order that the labels or levels of a factors are placed in. If FORDER=sorted the levels will be in ascending numerical order and the labels in alphabetical order, while if FORDER=unsorted the levels and labels will be in the order in which they are first met in a column.
The TEXTCONVERSION option controls how labels are converted to numbers in a column marked as a variate:
The EMETHOD=read option can be set to read in a row of column descriptions/extra from a spreadsheet file. By default, this row is taken as the second row in CELLRANGE. The EXTRAROW option can be used to modify the row where the description is read from. The row number is relative to the start of the cell range, but if an neagtive row number is provided, the descriptions will be read from the absolute row in the spreadsheet corresponding to the absolute value of the row number provided. If EXTRAROW=1 then the column names will be read from the second row.
The START parameter can supply a text to indicate where to start reading within a spreadsheet or text file. In a spreadsheet file (Excel, Quattro, Lotus) the cells from A1 will be searched row by row until a label is found that matches the text. Only cells below and to the right of this cell will then be imported. The text could thus be the name of the first variable to be read. Note that the text must not contain spaces or the division symbol (/). Similarly the END parameter can supply a text to indicate where to stop reading a spreadsheet or text file.
The INOPTION and OUTOPTION options allow extra options to be passed to DATALOAD.DLL. For example setting INOPTION='/k' keeps leading and trailing and doubled blanks in strings, OUTOPTION='/u' creates undecorated names in a CSV file (i.e. 'Factor', rather than 'Factor!') OUTOPTION='/c' combines the three columns Red, Green and Blue in an BMP file into a single column RGB and INOPTION='/m' loads the data as a matrix rather than as separate columns.
(Note: IMPORT replaces the procedure DATALOAD from earlier editions of GenStat.)
Options: PRINT, OUTTYPE, METHOD, IMETHOD, ENDSTATEMENT, SPSSMV, MISSING, FORDER, TEXTCONVERSION, NAMEROW, KEEPEMPTY, EMETHOD, EXTRAROW, PREFIX, TEMPMISSING, INOPTIONS, OUTOPTIONS.
Parameters: FILE, OUTFILE, SHEETNAME, CELLRANGE, COLUMNS, ISAVE, START, END.
Method
The request is passed to the DATALOAD.DLL library which reads the foreign file and returns any valid data found in a temporary GEN or GSH file. The following file types are supported: Excel 2-5,95,97,2000,XP,2003, Lotus WK1, Quattro (WQ1, WB*, QPW), dBase 2-5, Paradox 3-9, GenStat GSH, SAS PC 6.03-12, 7-9, SAS Transport, SAS JMP, Minitab 8-13, Statistica 5 and 6, Systat, MStat, Instat, Epi-Info, SPSS/Win, Gauss Data/Matrix (PC/Win/Unix), MatLab, S+ (PC/Unix), Stata 4-8, R data frames, Weka Attribute files, SigmaPlot 7-9, OSIRIS, Limdep, Comma delimited text files (*.CSV), ArcView/Info Shapefiles, MapInfo Exchange files, Windows Bitmap (*.BMP), Windows Sound (*.WAV) and NMR Binary files. The file type is worked out from the file contents, so the usual extension need not be used with the exception of the following file types which do not contain a unique signature: Epi-Info (.REC), S+ (.SDD) and Paradox (.DB). Any files not containing a unique file signature, but ending in these extensions will be classified as above. Any other file extensions will attempted to be read as a comma, space or tab delimited text file.
There is a known problem that using the OUTTYPE=GEN inside a FOR loop (or another other procedure) ties up input channels until exiting the FOR loop. Thus it may exhaust the available input channels. Either use the OUTTYPE=GSH or set LOAD=no and write code to input the files created outside the loop (you will need to provide an output file name to do this).
Action with
RESTRICT
Restrictions are not applicable to any of the parameters.