Help for MF4

PURPOSE

MF4 is an improved and debugged version of MF3.
MF4   allows   the  user  to  create C -like 
expressions to perform general mathematical operations on 
one  or more IBIS/graphics file columns.   The  expressions 
are  written as a parameter string.   The parameter  is 
interpreted  to determine the input and output  columns 
and   operations  to  be  performed.    Applies a user 
specified arithmetic expression to columns of a cagis table.
All results are computed in double precision (15 decimal
places) even if the input columns are single precision or
integer.

MF4 allows for multiple column assignments by using a $
separator.

MATH AND FUNCTIONS

The math functions available are: @sqrt, @alog, @alog10,
@aint, @sin, @cos, @tan, @asin, @acos, @atan,
@atan2, @abs, @min, @max, @mod.

Standard binary operations are: +,- *, / and ^ (pow).

Logic operations <, >, <=, >=, ==, !=, && (and), || (or),
and ! (not).  The main difference with C vs the FORTRAN
conventions used by program MF is the use of ^ for power
of two integers or reals. 

Note 1: ^^ (xor) is not implemented (use (a||b)&&(!(a&&b))).

Note 2: The old FORTRAN constructs .EQ., .NE. and **
are no longer allowed.

Note 3: When && is entered in a function, the internal
print shows only 1 &. Thats because TAE uses & for variable
names and traps it out. Putting in 3 & shows the correct
&& in the function. It doesnt really matter, the function
is evaluated properly. 

ABOUT MATH AND LOGIC OPERATORS IN FUCTION STATEMENTS

You need to be careful about 2 operators following each
other.  They can lead to incorrect code and not give
warnings about what is happening. You should be especially
careful about occurences where negative numbers might
inadvertently enter into the function.  This might happen
in scripts where a large number of mf4 calls are being
made. 

Suppose you have a function like 
       func=("c42=(c36 > &sigtest)")             (1)

As long as sigtest is a positive number everything proceeds
OK, but if it is negative than what you will have is

       func=("c42=(c36 > -&sigtest)")            (2)

Internally this is translated into two operators and will
give bad results.

Therefore, if neg numbers are possible then the function
should be written

       func=("c42=(c36 > (&sigtest))")           (3)

For example if the code=1 parameter is set, you will
see the pseudo code for (3):

<<   original value in row 1 col 42 = 0.000000
<<   original value in row 1 col 36 = 0.239406
LCMP     103    reg = -1.000000
STOR    1062    reg = -1.000000
LOAD       1    reg = 0.239406
GT      1062    reg = 1.000000
STOR    1063    reg = 1.000000
RETN       0    reg = 1.000000
>>   output value in row 6 col 42 = 1.000000


But for (1) and (2) the code will be:

<<   original value in row 1 col 42 = 0.000000
<<   original value in row 1 col 36 = 0.239406
LOAD       9    reg = 0.000000
ADD       34    reg = 0.000000
STOR    1062    reg = 0.000000
SUB      103    reg = -1.000000
STOR    1063    reg = -1.000000
RETN       0    reg = -1.000000
>>   output value in row 6 col 42 = -1.000000

which is incorrect.

The internal parser will provide some validation, but
it isnt optimal yet.

STRING FUNCTIONS

String functions are also available.  The arguments
can be column names (must contain strings) or string
constants enclosed in single quotes, except for some arguments
which are numeric (e.g., see fstr below).  

Examples:
	      @cat(a,b)  or @cat(a,'xxx')

The string functions are:

@cat(a,b)               concatenates a to b

@break(a,b)             outputs a up to first occurrence of a
			            character in b   (e.g., @break(a,'.,;:?'))

@fstr(a,m)              outputs the first m characters of a

@bstr(a,m)              outputs from the m'th character to the end of a

@adelete(a,b)           deletes any of b's characters from a
			 (e.g., @adelete(a,'.,;:?'))

@sdelete(a,b)           deletes occurrences of the whole string b from a
			 (e.g., @sdelete(a,'dog')

@trim(a,b)              trims from the low order end of a, all characters
			            in b, but stops trimming at the first non-b char

@ucase(a)               outputs a in upper case

@lcase(a)               outputs a in lower case

@ljust(a,n)             left justifies a in an n-character field.  if too
			            long, keeps high order part of a

@rjust(a,n)             right justifies a in an n-character field.  if too
			            long, keeps high order part of a

@replace(a,'dog=cat')   replaces all occurrences of the string before the
			                = with the string after the =

@strlen(a)              outputs the length of the string a

@pos(a,b)               finds the pattern b in a and returns its starting
			            position.  ^ is left anchor % is right anchor
			            ? matches any single character * matches a run
			 (e.g., @pos(a,'^a??.*z*%'))

@streq(a,b)             returns TRUE or 1 if a equals b else FALSE or 0

@strsub(a,b)            returns TRUE or 1 if a contains b else FALSE or 0

@strpat(a,b)            returns TRUE or 1 if a contains the pattern b
			            else FALSE or 0.  see the syntax for @pos(a,b)

@num(a)                 returns the numeric value of string a, which must
			            contain an integer or floating number, can have exponent
			            such as 2.73e-06 (use e, E, d, or D).

@i2str(n)               converts the integer n to a string; zero goes to 0

@f2str(f,n)             converts the float or integer f to a floating
			            string with n digits of precision to the right of
			            the decimal; n=0 omits the decimal; rounding is
			            performed

@dmsstr(a)              converts the string degree-min-second into a
                        degree number.   Acceptable formats include
                        1332727.666, 1332727.666W, 1.332727E+06W,
                        133d27m27.666 where the - can be any non-numeric
                        separator other than .+-Ee.  The EWNS can be
                        lower case and can be at the front e1332727.666.
                        A minor point, exponent e or E can be followed
                        by a number or a sign, but d or D must be
                        followed by a sign.

@dmsnum(f)              converts the number degree-min-second into a
                        degree number.   Acceptable formats include
                        1332727.666, -1332727 (real or integer).

All operations work as in the c language with
except for the column operations described below.

SPECIAL FUNCTIONS

The special variable @index may be used to insert
the record number into an expression.  The special
variable @rand may be used to put a random number
between 0 and 1 in the column.  If @rand is used, the
parameter seed can be used to vary the random sequence
Multiple formulas may be given by separating them with the
$ character.

COLUMN OPERATIONS 

Column operations are added features that perform
specialized functions to the table.  Two restrictions
must be observed:

1. Column operations cannot be used in a formula.
2. The arguments must be column names, not constants
   or expressions.

They perform an operation on columns placing
results in a column.  

There are two varieties of column operations;
those that replace all the values in the entire
column of the table with one value and those that 
modify segments of the column based upon a control
number in another column.

Note: The operations @fill and @interp require a column
of values separated by zeros.

In the following operations note that the use of
col requires a c preceeding the column number.

Example:
    @sum(c14) or @diff(c2)


The column operations are:

@average(col)           calculates the average of
			            the column and replaces all
                        values with the average.

@diff(col)              subtracts the value in the previous
                        record from the value in the current
                        record

@fill(col)              fill the zeros in the column with the
                        previous non-zero value in the column
                        (requires a column of values separated
                        by zeros)

@rsum(col)              computes running sum of values in the
                        column
                                                                                                                                                       
@sigma(col)             calculates the standard deviation in
			            the column and replaces all
                        values with the standard deviation

@sum(col)               sum the values in the column


@vmax(col)              calculates the maximum in the column
                                                                                                                                                       
@vmin(col)              calculates the minimum in the column



The column operations with control columns are:

@cavg(col1,col2)        Replace the values in col1 with
                        the average using col2 as control.

@count(col1,col2)       Count values in col1 using col2
                        as control column

@csigma(col1,col2)      Replacd the values in col1 with
                        the standard deviation of the
                        values using col2 as control.
 
@cdiff(col1,col2)       subtracts the value in the previous
                        record from the value in the current
                        record; restarts the operation for a
                        change in the value in col2

@csum(col1,col2)        controlled sum; sum the values
            			in col1 using col2 as a control
			            column, restarts the sum for a
			            change in the value in col2

@crsum(col1,col2)       controlled running sum; running
            			sum of values in col1 but restarts
			            the sum for a change in the value
			            in col2

@cvmax(col1,col2)       controlled maximum; calculates the
            			maximum in col1 using col2 as a control
			            column, restarts the max for a change
			            in the value in col2

@cvmin(col1,col2)       controlled minimum; calculates the
               			minimum in col1 using col2 as a control
			            column, restarts the min for a change
            			in the value in col2

@cdiff(col1,col2)       subtracts the value in the previous
            			record from the value in the current
			            record; restarts the operation for a
			            change in the value in col2

@shift(col,n)           shifts downward n records,
			            negative n for upward shift;
			            downward shift replicates first
			            value in column while upward
			            shift replicates last value

@rotate(col,n)          same as shift except values that
			            are rotated off the end of the
			            column are wrapped around to the
			            other end

@interp(col1,col2)      replace zero values between non-zero
			            values in col1 by interpolating
			            between the non-zero values in col1
			            to corresponding values in col2;
			            col2 may contain @index in which case
			            interpolation is linear or it may
			            contain some other function
			            (i.e. logarithmic or exponential)

GEOPHYSICAL Column Operations

    
@dist(lon1,lat1,lon2,lat2,dist)     calculate the distance in meters
				    between the two geographic points
				    on the Earth.  A spherical formula
				    is used above 1.05 degrees and a
				    plane formula is used below .95
				    degrees of central arc.  Between
				    these values, both formulas are used
				    and the result is a linear
				    interpolation of both formulas.
				    This is done to give a continuous
				    result.  Results near the poles
				    are not guaranteed accurate.

@head(lon1,lat1,lon2,lat2,head)     calculate the heading of the line
				    from the first to the second point
				    in degrees clockwise from north.
				    The interpolation technique used
				    in @dist is applied here.

@bear(lon1,lat1,lon2,lat2,lon3,lat3,bear) calculate the bearing of the
					  line from the first to the
				    second point clockwise in degrees
				    from the line from the first to the
				    third point.  The interpolation
				    technique used in @dist is
				    applied here.

Note: The geophysical column operations can have numbers in the fields
    instead columns. For example.

mf4 xxe2qq f="@dist(-1.130000000000e+02,4.100000000000e+01,c3,c4,c11)"

where,
    lon1 = -113.0 
    lat1 = 41.0
    lon2 = c3
    lat2 = c4
    dist = c11

This will take the first two values in combination with the two values
in column 3 and column 4 and place the result (the distance) in column 11.

None of the other column functions are allowed to do this.

Earlier implementions than May 9, 2008 would reject this structure
with an abend.

FSTRING EXAMPLE

A full example of an fstring to calculate a
time increment dt from a column t is
fstring="dt=t$shift(dt,-1)$dt=t-dt"

TAE COMMAND LINE FORMAT

     MF3 INP=int PARAMS

     where

     int                 is a random access file.  Since it
                         is used for both input and  output, 
                         no output file is specified.

     PARAMS              is   a  standard  VICAR   parameter 
                         field.

    FUNCTION is a string of math, logical, string, and column
    operations given in examples below.

    SEED is used to set a column to random values, or to
    use with a function involving a random values, 

    The DEBUG parameter will show the pseudo instructions for
    math, string and logic functions that arise from the 
    internal routine sp_xknuth as well as other information.

    The CODE parameter will show the pseudo instructions for
    math and logic functions. The nmenomics are the same
    as for the CODE parameter for mf but have different
    operands. No pseudo instructions are generated for
    column opearations.


METHOD

     MF3 performs arithmetic operations on an interface file.  
     The  program  uses  two  library  routines SP_KNUTH  and 
     SP_XKNUTH,   to   compile  and  interpret  C-like 
     expressions  entered by the parameters in an expression 
     such as:

                     C135 = (100*C34)/C4

     In this expression,  C34 and C4 are the input  columns.  
     SP_KNUTH    compiles   the   expression   into  pseudo-machine 
     instructions.   The  expression is applied to the input 
     column in SP_XKNUTH to produce the output column, C135.


RESTRICTIONS

1.     Maximum number of columns in one execution is 100. (oops, 9 until bug is fixed)
2.     The number of columns in the IBIS file is not limited here.
3.     Maximum input string length is 10,000 (40 x 250).
4.     Maximum number of operations is 3000.
5.     Maximum number of temp locations is 938.
6.     Maximum number of constants from the expression is 960.
7.     Operators must be separated by parentheses.

notes:

1.  Column numbers greater than 100 are mapped sequentially 1,2,3...
    so there is no limit on the number of columns in the IBIS file.
3.  The input parameter is a string array (40) each with 250 chars.
    The array is concatenated by the program into a single array.
4.  These can be counted by setting debug to one and counting the
    lines that begin with "xknuth:op,opnd".  The count is not
    easily determined by looking at a long input.
5.  These can be counted by setting debug to one and counting the
    lines that begin with "xknuth:op,opnd" and having an opnd
    value above 1061.  The count is not easily determined by
    looking at a long input.
6.  These can be counted in the input, or by setting debug to one
    and counting the lines that begin with "xknuth:op,opnd" and
    having an opnd value between 103 and 1061, inclusive.
EXAMPLES

     MF3 INP=FILE.INT FUNCTION=("C5 = C2/C3+100+@SQRT(C2)")

     In this example,  C2 is divided by C3 and added to  100 
     plus the square root of C2.   The results are placed in 
     C5.  Further examples of allowable functions follow:

                FUNCTION=("C5 = !(C3  || C2)")

     Logical   operations  are  performed  bitwise  on   the 
     operands. The  logical values T and F are converted to 1.  and 0. 
     for storage in column C5

                FUNCTION=("X5 = X3<=INDEX")

     Column 5 is 1.0 if column 3 has a value < its row value (INDEX).
     
                FUNCTION=("@average(C3)")

     In this example, the mean of column 3 is calculated and 
     that  value is placed in every row entry in  column  3.  
     This  operation  is different than the  arithmetic  and 
     logic operations given earlier because it operates on a 
     vertical  column instead of horizontally across a  row.  
     These  operations  cannot  be  used  in  an  arithmetic 
     expression  such as C5 = @average(C3)*10.   See the FUNCTION
     help for more examples.
MULTIPLE FUNCTION EXAMPLE

    MF3 INP=FILE.INT FUNCTION=("c42=((64*(c36>(-.40))) || (c42*(c42>0))) -1"$"c41=3.0")

    C42 is set to 64 if c36 is greater than -0.4, else is set to whatever
    is aleady in C42. C41 is set to 3.0.
CODE example:

    mf4 INP=FILE.INT CODE=1 FUNCTION=("c42=(((64)*(c36>(-.40))) || (c42*(c42>0))) -1")

    where col 36 = 0.239406
          col 42 = 32.0

    reg is the value to be placed in col 42

will produce:

mf4 version Jun 18, 2010 - RJB
function string = c42=(((64)*(c36>(-.40))) || (c42*(c42>0))) -1

<<   original value in row 1 col 42 = 0.000000
<<   original value in row 1 col 36 = 0.239406
LCMP     104    reg = -0.400000
STOR    1062    reg = -0.400000
LOAD       1    reg = 0.239406
GT      1062    reg = 1.000000
STOR    1063    reg = 1.000000
LOAD     103    reg = 64.000000
MUL     1063    reg = 64.000000
STOR    1064    reg = 64.000000
LOAD       0    reg = 0.000000
GT       105    reg = 0.000000
STOR    1065    reg = 0.000000
LOAD       0    reg = 0.000000
MUL     1065    reg = 0.000000
STOR    1066    reg = 0.000000
LOAD    1064    reg = 64.000000
OR      1066    reg = 64.000000
STOR    1067    reg = 64.000000
SUB      106    reg = 63.000000
STOR    1068    reg = 63.000000
RETN       0    reg = 63.000000
>>   output value in row 1 col 42 = 63.000000


One small item about using the CODE function. It shouldn't be used in
long procedures or ibis table files with thousands of rows. It
is really a debugging parameter and should be only used with snippets
of tables where problems are suspected.

HISTORY
Original Programmer:  A. L. Zobrist, 15 December 1976

Cognizant Programmer:  R. J. Bambery

Revision:
  1999-12-12 A. L. Zobrist - Double precision and strings, etc. 
  2000-02-06 A. L. Zobrist - Enlarge all Function restrictions
  2007-05-02 R. Bambery  - Add 2 new control column operators
             @csigma(col1,col2 and @cavg(col1,col2)  
             Add @count(col1,col2)
  2007-10-13 R. Bambery  - Change all internal printf statements to 
             sprintf/zvmessage combinations to print out to log files
             Fixed debug parameter to show symbolic dump of code
             produced like program f2.
  2007-10-18 R. Bambery - added CODE parameter and improved 
             error detecting for parentheses                        
  2007-10-25 R. Bambery - cleaned up debugging msgs, documentation
  2007-11-06 R. Bambery - increased internal string sizes for long
                    function strings
  2008-05-09 R. Bambery - geophysical columns can have values or 
             col numbers in the fields, subroutine mtchfield2 added
             to handle these cases since the program would abend
             with subroutine mtchfield
  2008-06-14 R. Bambery - processing for @interp and @fill used
             same code. This caused abends with 
             [TAE-PRCSTRM] Abnormal process termination; process status code = 11.;
             @fill processing was separated from @interp
             @fill(col) and @interp(col1,col2) have different parameter processing
  2008-07-28 R. Bambery - merged the following:
  2008-02-28 R. Bambery - Fixes for ANSI_C compiler in Linux
  2008-03-21 R. Bambery - merge with svn version 43 of mf3.c 
             by Walt Bunch (Dec 28, 2007 version)
  2008-03-26 R. Bambery - add error message warning of ibis
             files of 0 rows or 0 columns 
             (*** glibc detected *** free(): invalid next size (fast): 0x000000000063f400 ***)
  2008-07-26 R. Bambery - merged pkim's svn version 50 dated 04 Apr 2008
             removed routines assoc with libcarto   
             replace solaris mystrnicmp with strncasecmp in main44
  2008-08-20 R. Bambery - Incorporate consistencies with ifthen program
  2009-12-03 R. Bambery - made compatible with 64-bit linux (removed cartoVicarProtos.h)
             (Makefile.mf3)
  2010-01-29 R. Bambery - Made compatible with 64-bit afids Build 793      
             Linux, MacOSX (both Intel/PowerPC)     
  2010-06-18 R. Bambery -  This version was renamed mf4 in accordance with
             wishes of users of mf3 because it is more restrictive with parentheses   
  2011-05-06 R. J. Bambery - Removed all warning messages generated from gcc 4.4.4
             Build 1009
  2011-06-20 R. J. Bambery - Removed warnings from gcc 4.5.2 on mac
  2012-12-09 R. J. Bambery - Removed unneeded variables ptr, mxddwid, debugrec1