Help for MF4
PURPOSE
MF4 is an improved and debugged version of MF3.
MF4 allows the user to create C -like
expressions to perform general mathematical operations on
one or more IBIS/graphics file columns. The expressions
are written as a parameter string. The parameter is
interpreted to determine the input and output columns
and operations to be performed. Applies a user
specified arithmetic expression to columns of a cagis table.
All results are computed in double precision (15 decimal
places) even if the input columns are single precision or
integer.
MF4 allows for multiple column assignments by using a $
separator.
MATH AND FUNCTIONS
The math functions available are: @sqrt, @alog, @alog10,
@aint, @sin, @cos, @tan, @asin, @acos, @atan,
@atan2, @abs, @min, @max, @mod.
Standard binary operations are: +,- *, / and ^ (pow).
Logic operations <, >, <=, >=, ==, !=, && (and), || (or),
and ! (not). The main difference with C vs the FORTRAN
conventions used by program MF is the use of ^ for power
of two integers or reals.
Note 1: ^^ (xor) is not implemented (use (a||b)&&(!(a&&b))).
Note 2: The old FORTRAN constructs .EQ., .NE. and **
are no longer allowed.
Note 3: When && is entered in a function, the internal
print shows only 1 &. Thats because TAE uses & for variable
names and traps it out. Putting in 3 & shows the correct
&& in the function. It doesnt really matter, the function
is evaluated properly.
ABOUT MATH AND LOGIC OPERATORS IN FUCTION STATEMENTS
You need to be careful about 2 operators following each
other. They can lead to incorrect code and not give
warnings about what is happening. You should be especially
careful about occurences where negative numbers might
inadvertently enter into the function. This might happen
in scripts where a large number of mf4 calls are being
made.
Suppose you have a function like
func=("c42=(c36 > &sigtest)") (1)
As long as sigtest is a positive number everything proceeds
OK, but if it is negative than what you will have is
func=("c42=(c36 > -&sigtest)") (2)
Internally this is translated into two operators and will
give bad results.
Therefore, if neg numbers are possible then the function
should be written
func=("c42=(c36 > (&sigtest))") (3)
For example if the code=1 parameter is set, you will
see the pseudo code for (3):
<< original value in row 1 col 42 = 0.000000
<< original value in row 1 col 36 = 0.239406
LCMP 103 reg = -1.000000
STOR 1062 reg = -1.000000
LOAD 1 reg = 0.239406
GT 1062 reg = 1.000000
STOR 1063 reg = 1.000000
RETN 0 reg = 1.000000
>> output value in row 6 col 42 = 1.000000
But for (1) and (2) the code will be:
<< original value in row 1 col 42 = 0.000000
<< original value in row 1 col 36 = 0.239406
LOAD 9 reg = 0.000000
ADD 34 reg = 0.000000
STOR 1062 reg = 0.000000
SUB 103 reg = -1.000000
STOR 1063 reg = -1.000000
RETN 0 reg = -1.000000
>> output value in row 6 col 42 = -1.000000
which is incorrect.
The internal parser will provide some validation, but
it isnt optimal yet.
STRING FUNCTIONS
String functions are also available. The arguments
can be column names (must contain strings) or string
constants enclosed in single quotes, except for some arguments
which are numeric (e.g., see fstr below).
Examples:
@cat(a,b) or @cat(a,'xxx')
The string functions are:
@cat(a,b) concatenates a to b
@break(a,b) outputs a up to first occurrence of a
character in b (e.g., @break(a,'.,;:?'))
@fstr(a,m) outputs the first m characters of a
@bstr(a,m) outputs from the m'th character to the end of a
@adelete(a,b) deletes any of b's characters from a
(e.g., @adelete(a,'.,;:?'))
@sdelete(a,b) deletes occurrences of the whole string b from a
(e.g., @sdelete(a,'dog')
@trim(a,b) trims from the low order end of a, all characters
in b, but stops trimming at the first non-b char
@ucase(a) outputs a in upper case
@lcase(a) outputs a in lower case
@ljust(a,n) left justifies a in an n-character field. if too
long, keeps high order part of a
@rjust(a,n) right justifies a in an n-character field. if too
long, keeps high order part of a
@replace(a,'dog=cat') replaces all occurrences of the string before the
= with the string after the =
@strlen(a) outputs the length of the string a
@pos(a,b) finds the pattern b in a and returns its starting
position. ^ is left anchor % is right anchor
? matches any single character * matches a run
(e.g., @pos(a,'^a??.*z*%'))
@streq(a,b) returns TRUE or 1 if a equals b else FALSE or 0
@strsub(a,b) returns TRUE or 1 if a contains b else FALSE or 0
@strpat(a,b) returns TRUE or 1 if a contains the pattern b
else FALSE or 0. see the syntax for @pos(a,b)
@num(a) returns the numeric value of string a, which must
contain an integer or floating number, can have exponent
such as 2.73e-06 (use e, E, d, or D).
@i2str(n) converts the integer n to a string; zero goes to 0
@f2str(f,n) converts the float or integer f to a floating
string with n digits of precision to the right of
the decimal; n=0 omits the decimal; rounding is
performed
@dmsstr(a) converts the string degree-min-second into a
degree number. Acceptable formats include
1332727.666, 1332727.666W, 1.332727E+06W,
133d27m27.666 where the - can be any non-numeric
separator other than .+-Ee. The EWNS can be
lower case and can be at the front e1332727.666.
A minor point, exponent e or E can be followed
by a number or a sign, but d or D must be
followed by a sign.
@dmsnum(f) converts the number degree-min-second into a
degree number. Acceptable formats include
1332727.666, -1332727 (real or integer).
All operations work as in the c language with
except for the column operations described below.
SPECIAL FUNCTIONS
The special variable @index may be used to insert
the record number into an expression. The special
variable @rand may be used to put a random number
between 0 and 1 in the column. If @rand is used, the
parameter seed can be used to vary the random sequence
Multiple formulas may be given by separating them with the
$ character.
COLUMN OPERATIONS
Column operations are added features that perform
specialized functions to the table. Two restrictions
must be observed:
1. Column operations cannot be used in a formula.
2. The arguments must be column names, not constants
or expressions.
They perform an operation on columns placing
results in a column.
There are two varieties of column operations;
those that replace all the values in the entire
column of the table with one value and those that
modify segments of the column based upon a control
number in another column.
Note: The operations @fill and @interp require a column
of values separated by zeros.
In the following operations note that the use of
col requires a c preceeding the column number.
Example:
@sum(c14) or @diff(c2)
The column operations are:
@average(col) calculates the average of
the column and replaces all
values with the average.
@diff(col) subtracts the value in the previous
record from the value in the current
record
@fill(col) fill the zeros in the column with the
previous non-zero value in the column
(requires a column of values separated
by zeros)
@rsum(col) computes running sum of values in the
column
@sigma(col) calculates the standard deviation in
the column and replaces all
values with the standard deviation
@sum(col) sum the values in the column
@vmax(col) calculates the maximum in the column
@vmin(col) calculates the minimum in the column
The column operations with control columns are:
@cavg(col1,col2) Replace the values in col1 with
the average using col2 as control.
@count(col1,col2) Count values in col1 using col2
as control column
@csigma(col1,col2) Replacd the values in col1 with
the standard deviation of the
values using col2 as control.
@cdiff(col1,col2) subtracts the value in the previous
record from the value in the current
record; restarts the operation for a
change in the value in col2
@csum(col1,col2) controlled sum; sum the values
in col1 using col2 as a control
column, restarts the sum for a
change in the value in col2
@crsum(col1,col2) controlled running sum; running
sum of values in col1 but restarts
the sum for a change in the value
in col2
@cvmax(col1,col2) controlled maximum; calculates the
maximum in col1 using col2 as a control
column, restarts the max for a change
in the value in col2
@cvmin(col1,col2) controlled minimum; calculates the
minimum in col1 using col2 as a control
column, restarts the min for a change
in the value in col2
@cdiff(col1,col2) subtracts the value in the previous
record from the value in the current
record; restarts the operation for a
change in the value in col2
@shift(col,n) shifts downward n records,
negative n for upward shift;
downward shift replicates first
value in column while upward
shift replicates last value
@rotate(col,n) same as shift except values that
are rotated off the end of the
column are wrapped around to the
other end
@interp(col1,col2) replace zero values between non-zero
values in col1 by interpolating
between the non-zero values in col1
to corresponding values in col2;
col2 may contain @index in which case
interpolation is linear or it may
contain some other function
(i.e. logarithmic or exponential)
GEOPHYSICAL Column Operations
@dist(lon1,lat1,lon2,lat2,dist) calculate the distance in meters
between the two geographic points
on the Earth. A spherical formula
is used above 1.05 degrees and a
plane formula is used below .95
degrees of central arc. Between
these values, both formulas are used
and the result is a linear
interpolation of both formulas.
This is done to give a continuous
result. Results near the poles
are not guaranteed accurate.
@head(lon1,lat1,lon2,lat2,head) calculate the heading of the line
from the first to the second point
in degrees clockwise from north.
The interpolation technique used
in @dist is applied here.
@bear(lon1,lat1,lon2,lat2,lon3,lat3,bear) calculate the bearing of the
line from the first to the
second point clockwise in degrees
from the line from the first to the
third point. The interpolation
technique used in @dist is
applied here.
Note: The geophysical column operations can have numbers in the fields
instead columns. For example.
mf4 xxe2qq f="@dist(-1.130000000000e+02,4.100000000000e+01,c3,c4,c11)"
where,
lon1 = -113.0
lat1 = 41.0
lon2 = c3
lat2 = c4
dist = c11
This will take the first two values in combination with the two values
in column 3 and column 4 and place the result (the distance) in column 11.
None of the other column functions are allowed to do this.
Earlier implementions than May 9, 2008 would reject this structure
with an abend.
FSTRING EXAMPLE
A full example of an fstring to calculate a
time increment dt from a column t is
fstring="dt=t$shift(dt,-1)$dt=t-dt"
TAE COMMAND LINE FORMAT
MF3 INP=int PARAMS
where
int is a random access file. Since it
is used for both input and output,
no output file is specified.
PARAMS is a standard VICAR parameter
field.
FUNCTION is a string of math, logical, string, and column
operations given in examples below.
SEED is used to set a column to random values, or to
use with a function involving a random values,
The DEBUG parameter will show the pseudo instructions for
math, string and logic functions that arise from the
internal routine sp_xknuth as well as other information.
The CODE parameter will show the pseudo instructions for
math and logic functions. The nmenomics are the same
as for the CODE parameter for mf but have different
operands. No pseudo instructions are generated for
column opearations.
METHOD
MF3 performs arithmetic operations on an interface file.
The program uses two library routines SP_KNUTH and
SP_XKNUTH, to compile and interpret C-like
expressions entered by the parameters in an expression
such as:
C135 = (100*C34)/C4
In this expression, C34 and C4 are the input columns.
SP_KNUTH compiles the expression into pseudo-machine
instructions. The expression is applied to the input
column in SP_XKNUTH to produce the output column, C135.
RESTRICTIONS
1. Maximum number of columns in one execution is 100. (oops, 9 until bug is fixed)
2. The number of columns in the IBIS file is not limited here.
3. Maximum input string length is 10,000 (40 x 250).
4. Maximum number of operations is 3000.
5. Maximum number of temp locations is 938.
6. Maximum number of constants from the expression is 960.
7. Operators must be separated by parentheses.
notes:
1. Column numbers greater than 100 are mapped sequentially 1,2,3...
so there is no limit on the number of columns in the IBIS file.
3. The input parameter is a string array (40) each with 250 chars.
The array is concatenated by the program into a single array.
4. These can be counted by setting debug to one and counting the
lines that begin with "xknuth:op,opnd". The count is not
easily determined by looking at a long input.
5. These can be counted by setting debug to one and counting the
lines that begin with "xknuth:op,opnd" and having an opnd
value above 1061. The count is not easily determined by
looking at a long input.
6. These can be counted in the input, or by setting debug to one
and counting the lines that begin with "xknuth:op,opnd" and
having an opnd value between 103 and 1061, inclusive.
EXAMPLES
MF3 INP=FILE.INT FUNCTION=("C5 = C2/C3+100+@SQRT(C2)")
In this example, C2 is divided by C3 and added to 100
plus the square root of C2. The results are placed in
C5. Further examples of allowable functions follow:
FUNCTION=("C5 = !(C3 || C2)")
Logical operations are performed bitwise on the
operands. The logical values T and F are converted to 1. and 0.
for storage in column C5
FUNCTION=("X5 = X3<=INDEX")
Column 5 is 1.0 if column 3 has a value < its row value (INDEX).
FUNCTION=("@average(C3)")
In this example, the mean of column 3 is calculated and
that value is placed in every row entry in column 3.
This operation is different than the arithmetic and
logic operations given earlier because it operates on a
vertical column instead of horizontally across a row.
These operations cannot be used in an arithmetic
expression such as C5 = @average(C3)*10. See the FUNCTION
help for more examples.
MULTIPLE FUNCTION EXAMPLE
MF3 INP=FILE.INT FUNCTION=("c42=((64*(c36>(-.40))) || (c42*(c42>0))) -1"$"c41=3.0")
C42 is set to 64 if c36 is greater than -0.4, else is set to whatever
is aleady in C42. C41 is set to 3.0.
CODE example:
mf4 INP=FILE.INT CODE=1 FUNCTION=("c42=(((64)*(c36>(-.40))) || (c42*(c42>0))) -1")
where col 36 = 0.239406
col 42 = 32.0
reg is the value to be placed in col 42
will produce:
mf4 version Jun 18, 2010 - RJB
function string = c42=(((64)*(c36>(-.40))) || (c42*(c42>0))) -1
<< original value in row 1 col 42 = 0.000000
<< original value in row 1 col 36 = 0.239406
LCMP 104 reg = -0.400000
STOR 1062 reg = -0.400000
LOAD 1 reg = 0.239406
GT 1062 reg = 1.000000
STOR 1063 reg = 1.000000
LOAD 103 reg = 64.000000
MUL 1063 reg = 64.000000
STOR 1064 reg = 64.000000
LOAD 0 reg = 0.000000
GT 105 reg = 0.000000
STOR 1065 reg = 0.000000
LOAD 0 reg = 0.000000
MUL 1065 reg = 0.000000
STOR 1066 reg = 0.000000
LOAD 1064 reg = 64.000000
OR 1066 reg = 64.000000
STOR 1067 reg = 64.000000
SUB 106 reg = 63.000000
STOR 1068 reg = 63.000000
RETN 0 reg = 63.000000
>> output value in row 1 col 42 = 63.000000
One small item about using the CODE function. It shouldn't be used in
long procedures or ibis table files with thousands of rows. It
is really a debugging parameter and should be only used with snippets
of tables where problems are suspected.
HISTORY
Original Programmer: A. L. Zobrist, 15 December 1976
Cognizant Programmer: R. J. Bambery
Revision:
1999-12-12 A. L. Zobrist - Double precision and strings, etc.
2000-02-06 A. L. Zobrist - Enlarge all Function restrictions
2007-05-02 R. Bambery - Add 2 new control column operators
@csigma(col1,col2 and @cavg(col1,col2)
Add @count(col1,col2)
2007-10-13 R. Bambery - Change all internal printf statements to
sprintf/zvmessage combinations to print out to log files
Fixed debug parameter to show symbolic dump of code
produced like program f2.
2007-10-18 R. Bambery - added CODE parameter and improved
error detecting for parentheses
2007-10-25 R. Bambery - cleaned up debugging msgs, documentation
2007-11-06 R. Bambery - increased internal string sizes for long
function strings
2008-05-09 R. Bambery - geophysical columns can have values or
col numbers in the fields, subroutine mtchfield2 added
to handle these cases since the program would abend
with subroutine mtchfield
2008-06-14 R. Bambery - processing for @interp and @fill used
same code. This caused abends with
[TAE-PRCSTRM] Abnormal process termination; process status code = 11.;
@fill processing was separated from @interp
@fill(col) and @interp(col1,col2) have different parameter processing
2008-07-28 R. Bambery - merged the following:
2008-02-28 R. Bambery - Fixes for ANSI_C compiler in Linux
2008-03-21 R. Bambery - merge with svn version 43 of mf3.c
by Walt Bunch (Dec 28, 2007 version)
2008-03-26 R. Bambery - add error message warning of ibis
files of 0 rows or 0 columns
(*** glibc detected *** free(): invalid next size (fast): 0x000000000063f400 ***)
2008-07-26 R. Bambery - merged pkim's svn version 50 dated 04 Apr 2008
removed routines assoc with libcarto
replace solaris mystrnicmp with strncasecmp in main44
2008-08-20 R. Bambery - Incorporate consistencies with ifthen program
2009-12-03 R. Bambery - made compatible with 64-bit linux (removed cartoVicarProtos.h)
(Makefile.mf3)
2010-01-29 R. Bambery - Made compatible with 64-bit afids Build 793
Linux, MacOSX (both Intel/PowerPC)
2010-06-18 R. Bambery - This version was renamed mf4 in accordance with
wishes of users of mf3 because it is more restrictive with parentheses
2011-05-06 R. J. Bambery - Removed all warning messages generated from gcc 4.4.4
Build 1009
2011-06-20 R. J. Bambery - Removed warnings from gcc 4.5.2 on mac
2012-12-09 R. J. Bambery - Removed unneeded variables ptr, mxddwid, debugrec1
PARAMETERS:
INP
Input IBIS interface file
FUNCTION
Specifies function and columns,
case insensitive
SEED
Use to vary the random sequence
DEBUG
Set 1 to see symbol fetch,
pseudocode, other information
CODE
Set 1 to see pseudo code
See Examples:
Cognizant Programmer: