| Generate a Random Subset from a Spreadsheet |
| See Also |
This menu can be used to create new spreadsheets based on a random subset/sample of rows
from a spreadsheet.
Number of Samples
Provides a space to specify the number of random samples to be used. Alternatively, you
can provide a percentage of the number of rows to be used by selecting the % option.
Note that if Sample with Replacement option is selected, then the number of samples
must be less than the number or rows in the spreadsheet (or 100 %).
Sample with Replacement
When selected, sampling with replacement will be used when forming the subset. That is at each random
selection of a row, all the available rows are eligible for selection. If this option is not selected
then only the rows that have not been previously selected are eligible for selection.
Weighting
If a column in the drop down list is selected, then the values in the
selected column will be used in a weighted random sample.
The default is the <Equal> setting, where all rows have equal chance
of being selected.
Rows with a weight value ≤ 0 will not be included
in the random sample.
Seed
The Seed option is used to specify an integer value that will
be used to start the randomization. If a value of * is given for the seed, a
value from the computer's clock will be used.
Create Unique column names
When selected, columns in the spreadsheet will have
new names generated for them so that they are unique, otherwise the
columns will have the same names as the original spreadsheet.
Randomize Rows
When selected, rows in the resulting spreadsheet will be sorted into a random order.
OK
Generate a random subset into a new spreadsheet and close the dialog.
Cancel
Close the dialog without creating any new spreadsheets.
See Also
Split or Subset a Spreadsheet
Randomize Rows in a Spreadsheet
Duplicate a Spreadsheet
Spreadsheet Manipulate Menu
Spreadsheet Calculate Menu
The SUBSET procedure can be used in conjunction with the
GRUNIFORM function of
CALCULATE command in the command language to do sampling
with or without replacement.
To sample without replacement P rows out of N, the following commands could be
used:
FSORT [INDEX=GRUNIFORM(N;0;1)] !(1...N); Pos
SUBSET [Pos <= P] X,Y; Sample_X,Sample_Y
To sample with replacement P rows out of N, the following commands could be used:
CALC Row = INT(N*GRUNIFORM(P;0;1)) + 1
VARIATE Sample_X,Sample_Y; (X,Y)$[Row]