Calculates numerical values for data structures.
The CALCULATE directive allows you to perform transformations and other calculations. It has the form:
The expression specifies what calculation is to be done, and where the results are to be stored. For example, the command
CALCULATE Area = Length * Breadth
specifies that the structure Area is to store the results of Length multiplied by Breadth. All the usual arithmetic operators are available:
CALCULATE can operate on any numerical data structure and it will automatically declare the structure to hold the results if you have not declared it already. So, if Area has not yet been defined and Length and Breadth are scalars, Area will become a scalar too.
Generally the structures involved in the calculation must have the same "shape" (for example, variates must have the same length) and the operators operate element-by-element over all their values. So, if Length and Breadth were variates, Area would become a variate each of whose units contained the product of the corresponding units of Length and Breadth. However, scalars and ordinary numbers can be included with calculations on any type of data structure. So
CALCULATE Kilo = Pound / 2.2
would be valid whatever the type of the structures Kilo and Pound.
If any of the values involved in a numerical expression is missing, the result will be missing too.
GenStat has operators for relational tests:
These generate a result of zero if the test is false, and one if it is true. (In fact any non-zero value is taken to represent a true value.) With most of these operators, a missing value in either operand (or in both) will generate a missing result. The exceptions are .EQ. and .NE. (and their synonyms), and EQS. and .NES.: when both operands are missing .EQ. and .EQS. give a true result, while .NE. and .NES. give a false result.
There are also logical operators that can be used to combine the results of expressions involving relational operators.
The precedence rules of the operators are very similar (but possibly not identical) to those in computer languages like C or Fortran. The list below shows the order in which the operators are evaluated when they are used in expressions, if brackets are not used to make the order explicit:
1) .NOT. Monadic -
2) .IS. .ISNT. .IN. .NI. *+
4) * /
5) + Dyadic -
6) < > == <= >= /= <> .LT. .GT. .EQ. .LE. .GE. .NE. .NES.
7) .AND. .OR. .EOR.
(Monadic minus means the use of the minus sign in a negative number: for example, -1.) Within each class, operations are done from left to right within an expression, unless brackets are used to indicate some other order. So
A > B+C/D*E
is the same as
A > ( B + ( (C/D) * E )
Expressions can contain lists, to specify that the same calculation is to be done for several sets of structures. For example
CALCULATE Pay1,Pay2 = Hours1,Hours2 * Rate + Bonus
This has the same effect as the two commands
CALCULATE Pay1 = Hours1 * Rate + Bonus
CALCULATE Pay2 = Hours2 * Rate + Bonus
Notice that, if any of the lists on the right-hand side of the expression is shorter than the list on the left-hand side, the list is re-used. So the value of Bonus is used for both calculations. To take a more complicated example
CALCULATE X,Y,Z = A,B,C + 1,2
is the same as the three calculations
CALCULATE X = A + 1
CALCULATE Y = B + 2
CALCULATE Z = C + 1
However, the lists on the right-hand side must not be longer than the list on the left-hand side.
When the calculation contains lists, you can set the INDEX option to a scalar which will contain the index of the current calculation. For example
CALCULATE [INDEX=i] X,Y,X = i * A,B,C
is the same as the three calculations
CALCULATE X = 1 * A
CALCULATE Y = 2 * B
CALCULATE Z = 3 * C
as X and A are the first items of their lists, Y and B are the second, and Z and C are the third.
GenStat provides a wide range of functions for use in expressions. Many of these, known as transformations, produce a result that is the same type of structure as the argument of the function. For example,
CALCULATE Logsulph = LOG(Sulphur)
uses the LOG function to take natural logarithms of the values in the data structure Sulphur. If Sulphur is a variate Logsulph will also be a variate with the same number of values.
Scalar functions produce a scalar summary of all the values in a structure. For example, we can use the SUM function to calculate the total Sulphur values:
CALCULATE Totsulph = SUM(Sulphur)
There are also vector functions that produce summaries across the values of a set of variates (or of scalars). The set of variates must be put into a pointer. So, we could form a variate M each of whose units contains the mean of the values in the corresponding units of the variates A, B and C by
POINTER [VALUES=A,B,C] Vars
CALCULATE M = VMEAN(Vars)
This can be done more succinctly using an unnamed pointer:
CALCULATE M = VMEAN(!p(A,B,C))
When a function has more than one argument, each is separated from the next by a semicolon. For example
CALCULATE Corr = CORRELATION(X; Y)
calculates the correlation between the values in X and Y.
Function arguments can also be lists, running in parallel with the other lists in the expression. For example, to calculate Corr1 as the correlation between X1 and Y1, and Cor2 as the correlation between X2 and Y2:
CALCULATE Corr1,Corr2 = CORRELATION(X1,X2; Y1,Y2)
When a factor occurs in an expression on the right-hand side, GenStat usually works with its levels. The exception is when the factor occurs as the first operand of the operators .IN. or .NI. and the second operand is a text; the factor labels are then used instead. A factor can also occur on the left-hand side of an expression and receive the results of a calculation; an error is reported if any of the resulting values is not one of the levels of the factor. Two functions are provided especially for factors: NLEVELS(F) gives the number of levels of the factor F, and NEWLEVELS(F; V) forms a variate from the factor F, using variate V to define values for the levels.
Text structures are allowed only with the relational operators .EQS., .NES., .IN. and .NI. or in the string functions. The result of any expression is a number, so you cannot create a text with CALCULATE, even if the structures on which the operations are being done are texts.
All the arithmetic, relational and logical operators and transformation functions can also be used with matrix structures, symmetric matrices and diagonal matrices. The basic rule when using these with different types of matrix is that their dimensions must conform. This means that, for each pair of matrices, row dimension must match row dimension, and column dimension must match column dimension. So, for example, you can add a diagonal matrix to a matrix structure provided the number of rows and columns of the matrix equals the number of rows (and columns) of the diagonal matrix. The multiplication operator (*) performs element-by-element multiplication of two matrices: for matrix multiplication, there is the compound operator *+ or the function PRODUCT, which is one of the many specialised matrix functions.
You can use tables in expressions in much the same way as you would any other numerical structure. Tables in expressions must be either all without margins or all with margins. If you try to mix tables with and without margins, GenStat will report an error. Calculations with tables are very straightforward when they have the same factors in their classifying sets. The tables then have identical "shapes", and the arithmetic, relational, and logical operators and the transformation functions act element-by-element, in the usual way. When tables have different classifying sets, there are two cases to consider. The first case is when the table on the left-hand side has a factor in its classifying set that is not in the classifying set of the table on the right-hand side. In this case, the right-hand table is expanded to include that factor, by duplicating its values across the levels of the factor and any margin. The second case is when the table on the right-hand side has a factor in its classifying set that is not in the classifying set of the table on the left-hand side. Now the values in the margin over that factor are taken for the left-hand table. If the table has no margins, they must be calculated first. By default GenStat forms marginal totals, but you can use the special table functions to form other types of margin.
Dummies can be used with the relational operators .IS. and .ISNT. which test whether or not a dummy points to a particular identifier. For example, to store in Sca the result of a test to check whether dummy D points to Va, you would put
CALCULATE Sca = D.IS.Va
while to test that D does not point to Vb, you would put
CALCULATE Sca = D.ISNT.Vb
There are also the functions SET and UNSET to test if a dummy has or has not been set to any value. Other specialised functions include subset functions, statistical functions and random number generation functions.
CALCULATE has four options: PRINT, ZDZ, TOLERANCE and SEED. If you set the PRINT option to summary, GenStat will print some summary information every time that values are assigned to a structure. The information has the same form as in the READ directive: identifier, minimum value, mean value, maximum value, number of values, number of missing values, and whether or not the set of values is skew.
If you try to use CALCULATE to do something invalid, such as the logarithm or the square root of a negative number, GenStat generates a warning diagnostic and inserts a missing value in the offending unit. The one exception is the division of zero by zero, which is regarded as deliberate. GenStat thus does not print a diagnostic, but uses option ZDZ to determine whether the result should be a missing value (ZDZ=missing) or zero (ZDZ=zero); the default is missing.
The SEED option provides the seed to generate random numbers for the functions GRBETA, GRBINOMIAL, GRCHISQUARE, GRF, GRGAMMA, GRHYPERGEOMETRIC, GRLOGNORMAL, GRNORMAL, GRPOISSON, GRT and GRUNIFORM if these occur in the expression. The seed can be any non-negative integer, but only the last six digits of its integer part are used. Thus the seeds 2144556 and 7144556.3 are both equivalent to the seed 144556. The default value of zero continues an existing sequence of random numbers, if either these functions or the function URAND (which has its own argument to set the seed) has already been used in the current GenStat run. If, however, this is the first time that these functions have been used, GenStat picks a random seed.
The RESTRICTEDUNITS option allows you to apply a "restriction" to the vectors in the expression. Its setting is a variate containing a list of the units numbers on which you want the calculation to be done (the other units are then ignored). This works in the same way as if you had applied a restriction on one of these vectors explicitly, using the RESTRICT directive (see below). However, if RESTRICTEDUNITS is set, restrictions on the vectors themselves are ignored. By default, when RESTRICTEDUNITS is unset, CALCULATE will look for restrictions in the vectors, as usual. Note, though, that you can set RESTRICTEDUNITS=* to make the calculation work on all the units, regardless of whether any of the vectors is restricted.
Options: PRINT, ZDZ, TOLERANCE, SEED, INDEX.
Action with RESTRICT
If you are calculating values for a variate or factor, you can restrict the operation to only a subset of the units by applying a restriction to any of the variates, factors or texts involved in that calculation. The values in the other units are left unchanged. If more than one of these vectors is restricted, they must all be restricted in the same way. Note, though, that restrictions on a variate within a scalar function (for example MEAN), or within the RESTRICTION function, operate independently from the main calculation outside. Also, restrictions in the main calculation are ignored if it contains qualified identifiers or the ELEMENTS function.
Directives: EXPRESSION, SETCALCULATE, NAG, FLRV, QRD, SVD.