Help for MARSCOR3
PURPOSE:
Given a stereo pair of images and a low-resolution or low-quality disparity
map representing line and sample disparities for every pixel in the scene,
this program refines the disparities to create a higher-quality map.
The input typically comes from a 1-D, fast correlator such as MARSJPLSTEREO.
The output is typically run through the program MARSXYZ to generate X,Y,Z
coordinates over the entire image.
This is a general-purpose correlation program, which does not make use of
any pointing or camera model information. The input images do not have to
be registered or epipolar aligned, as long as a lower-quality disparity map
is available. However, MARSJPLSTEREO requires aligned images with camera
models, so the inputs will typically be so.
This program is derived from MARSCORR and MARSCOR2.
EXECUTION:
marscor3 inp=(left,right) out=disparity in_disp=input_disparity params
where:
left is the left eye image of a stereo pair
right is the corresponding right eye image of a stereo pair
disparity is the output disparity map (one or two files).
input_disparity is the input disparity map (one or two files).
Although "left" and "right" are used above, there is no actual restriction
that the first image be left and the second be right. Correlations are done
from the first image to the second. Output images (disparity and mask)
correspond to the first image, containing matching coordinates in the second
image. However, for convenience, "left" and "right" are used throughout
this help. Note that the input disparity map must use the same convention.
Additionally, out_quality can be used to name an output file which will
contain the correlation qualities for each pixel.
METHOD:
The program loops through every pixel in the image. For each pixel, the
corresponding input disparity is obtained. This input disparity may be
adjusted based on the input zoom factor (derived from DISP_PYRAMID).
This point is then used as a starting point to obtain a refined correlation.
If there is no input disparity for the pixel, no correlation is performed.
Previous versions truncated the input disparity to an integer; now sub-pixel
disparity information is retained.
If CHECK is non-zero, the point is then correlated a second time, this time
going right-to-left (right is the reference). The result must be within
CHECK pixels of the original left point or the pixel is rejected. (This
only happens if the quality is less than the CHECK_QUALITY threshhold).
Once that is complete, if GORES is turned on, then the program makes an
additional set of passes through the image. In each pass, pixels for which
no disparity exists are examined. For each such pixel, the 8 neighbors are
examined to find the one with the best correlation value so far. This
neighbor is used as the initial guess for the pixel in question (modified
+/-1 as appropriate). If no correlated neighbors are found, the pixel is
skipped. This process allows holes caused by failure of the input (1-D)
correlator to be filled. The program keeps making passes through the image
until GORE_PASSES is exhausted (and non-0), or no gores are filled in a pass.
The reverse check is again performed for each new pixel if CHECK is non-zero.
Note that the image may be pre-processed due to MULTIPASS (downsampled)
or FILTER (low-pass filtered) before the correlation takes place.
Correlation Modes
-----------------
All correlations are performed using the gruen correlation routine. See
the gruen documentation for details of the algorithm. The TEMPLATE parameter
specifies the size of the correlation window and SEARCH specifies the size
of the area in which to search for a match.
The MODE parameter is used to select the gruen mode:
linear_only: gruen mode 0
annealing: gruen mode 1
amoeba: gruen mode 2
linear_amoeba: gruen mode 3
annealing_amoeba: gruen mode 4
amoeba2: gruen mode 5
linear_amoeba2: gruen mode 6
amoeba8: gruen mode 7
amoeba4: gruen mode 8
amoeba5: gruen mode 9
The separate keyword LINEAR can be used to add linear pre-searching to
any of the above modes (it sends -(mode) to gruen()). Linear searching is
done only once even if e.g. both -linear and -linear_amoeba are specified.
Of the above, the recommended modes are the five starting with "amoeba".
The others are provided only for experimentation, and may significantly
increase the execution time of the program. The annealing modes are not
fully implemented in the code at this time, but it would be trivial to do
so if necessary.
The amoeba modes work with 2, 4, 5, 6, or 8 degrees of freedom (for historical
reasons, 6 is called simply "amoeba"). In general, the left template is
mapped to the right using the following equations:
x' = a*x + b*y + c + g*x*y
y' = d*x + e*y + f + h*x*y
This implements a generic perspective transform (it maps the square to any
quadrilateral). Each of the amoeba modes works as described below. For
parameters not adjustable in a given mode, they are fixed as follows:
a = 1, b = 0, c = 0, d = 0, e = 1, f = 0, g = 0, h = 0
amoeba2: Adjusts c and f only. This allows for translation of the correlation
window only. It is the fastest mode, but generally the least accurate.
amoeba4: Adjusts b, c, f, and g. This allows for translation, shear, and
trapezoid in x, but only translation in y. This models a pair of epipolar-
aligned stereo cameras looking out at a flat plane (such as from a rover).
amoeba5: Adjusts a, b, c, f, and g. This allows for any transform in x
(translation, shear, trapezoid, and scale) but only translation in y. This
models a pair of epipolar-aligned stereo cameras looking at a more general
scene (not constrained to a flat plane). This is generally the best mode for
epipolar-aligned cameras.
amoeba: Adjusts a, b, c, d, e, and f. This implements a generic affine
transform with no trapezoidal parameters. Translation, scale, shear, and
rotation are modeled. This mode is historically the primary mode, but does
not model in-situ cameras very well.
amoeba8: Adjusts all parameters. This implements a generic perspective
transform with translation, scale, shear, rotation, and trapezoid (or
perspective) transforms. This is the most general mode, but may be
underconstrained for small window sizes.
The best mode to use is the one that best models the transform from the
left to right cameras, without providing too many degrees of freedom.
Sometimes, execution speed may drive one to a lower degree of freedom,
trading accuracy for speed.
Historically, amoeba has been initialized with an identity transform,
plus the disparity translation. There are now two ways to affect the
input transform: The ROTATION, XSCALE, and YSCALE parameters, and the
IN_COEFS parameter.
The ROTATION, XSCALE, and YSCALE parameters allow the initial transform to
be set to a given rotation and scale. This is useful if there is an overall
frame rotation between the images (such as a mast camera looking straight
down and being repointed, or an arm camara), or a scale difference (such as
when correlating different instruments). This starting point makes amoeba's
job easier, since it doesn't have to determine the rotation and scale each
time (which often fails with marginal correlations, or it "looks" in the
wrong direction). Thus the results are more stable, when the rotation or
scale difference is severe.
The ROT_RANGE and SCALE_RANGE parameters, if set, establish limits on what
the rotation and scale are allowed to be. This is accomplished by setting
limits on the transform coefficients based on the parameters, so it's not an
absolute limit on rotation or scale (certain transforms, generally pathological
cases, could have rotation or scale outside the range yet have coefficients
within the limits). Limits are only applied if either are given; if only
one set is given, the other defaults (rotation +/- 5 degrees, scale 0.9-1.1).
Determining the rotation and scale is currently an exercise for the user.
It would be possible to determine these given the camera models; this is
done for rotation in marsautotie. However, since marscor3 does not use
camera models, that is not done here.
The IN_COEFS parameter allows the user to directly specify the initial
transform on a per-pixel basis. The input file should be a 6-band file the
same size as the input disparity file, containing the parameters a,b,d,e,g,h.
Note that the translation terms, c and f, are not included because they come
from the input disparity file. Note that OUT_COEF can be used to save the
final coefficients, in the same file format.
Important Parameters
--------------------
The template (correlation patch) and search area sizes may be rectangular
instead of square. For landed images such as Mars Pathfinder or MER,
it may be advantageous to specify areas that are much wider than they are
tall.
The FTOL parameter deserves special discussion, as it plays a huge factor
in execution time. FTOL is a parameter passed in to the amoeba function
minimization routine. It represents the tolerance in the function being
minimized. When the function (in this case, the inverse of correlation
quality) starts changing by amounts less than FTOL, the algorithm assumes
it has found a minimum and returns.
Thus the size of FTOL has a direct correlation with the accuracy of the
result. Smaller values mean a more accurate result, with commensurately
longer runtimes. However, FTOL is NOT the actual accuracy. While there
is a correlation with accuracy, it is not a simple relationship. Setting
it to, say, 0.01 most definitely does NOT imply a correlation accuracy of
0.01 pixels. It represents how accurate the correlation quality is, not
the pixel locations.
Historically, FTOL has been fixed to 0.000001. This led to very accurate
results, but horribly long runtimes. Experimentation has shown that a
value around 0.001, or slightly higher, should be sufficient for most
purposes, but this should be re-verified for each new type of data.
Input disparities
-----------------
The input disparities are typically generated by MARSJPLSTEREO, but might
also come from MARSUNLINEARIZE or MARSDISPINVERT. They can be generated
using a "pyramid" value. This value must be input to MARSCOR3 via the
DISP_PYRAMID parameter.
The pyramid value represents a zoom down of the image, by a factor of
2^pyramid. So a pyramid of 0 implies full scale, 1 means half-scale,
2 represents a quarter-scale image, etc. The values IN the file are
similarly zoomed down - they map from e.g. a quarter-scale image to
a quarter-scale image.
Furthermore, the pixel in the file actually represents the upper-left
pixel in the zoom box. For a pyramid of 3 (zoom of 8), the upper left
pixel of each 8x8 box is the one mapped in the file.
As with output files, the input disparity may be a single 2-band file, or
two separate single-band files. The first contains line disparities, and
the second contains sample disparities.
If the pyramid value is more than 1, then it is quite likely that the
initial point obtained from the input disparity is off by more than amoeba
can handle, giving very "patchy" results. The amoeba algorithm really
needs to start within a pixel or so for reliable results.
To handle this case, MULTIPASS may be specified. If MULTIPASS is turned on
and DISP_PYRAMID is greater than 1, then the entire correlation process is
repeated multiple times. At each iteration, the input image is downsampled
so it is twice the size of the input correlation map, for an effective
pyramid level of 1. The result then becomes the input disparity map for
the next iteration.
So, for example, with MULTIPASS and DISP_PYRAMID=3, then input disparity
map is 1/8 the size of the input. The first correlation is performed with
a 1/4 scale image pair (the full correlation, including gore passes).
The resulting 1/4 scale disparity is then correlated with a 1/2 scale image,
then the result of that is correlated with the full scale image to produce
the final result.
The STOP_PYRAMID parameter can be used in conjunction with DISP_PYRAMID and
MULTIPASS to stop the pyramid process prematurely. Normally STOP_PYRAMID
is 0, meaning go all the way to full-res. But if STOP_PYRAMID is non-0,
the process stops there. A value of 1 would produce a half-scale output,
2 would produce quarter-scale, etc. At least one pass is done regardless
of STOP_PYRAMID. Note that the output can be fed back into another run of
marscor3 by adjusting DISP_PYRAMID to the prior STOP_PYRAMID value. This
would allow, for example, use of different correlation parameters at
different pyramid levels.
If FEEDBACK_COEFS is on, the perspective transform coefficients are
"remembered" from one pyramid pass to the next. So for the next stage
of the pyramid, the starting point uses the coefficients derived from the
previous stage. This helps retain the "shape" of the correlation window
mapping across pyramid levels.
Output file contents
--------------------
All output files are in the coordinates of the first (left eye) image.
For example if the sample disparity file has a value of 56.67 at sample 50
it means that the sample location of a tiepoint located at sample 50 in the
left eye image corresponds to sample 56.67 in the right eye image.
The line and sample disparities can be in two separate files. However,
normally (if only one filename is provided), both are written to a single,
2-banded file, with line in the first band and sample in the second.
First output (or band): A REAL image containing the line disparity. The
line/sample coordinate of each pixel in the output defines the position
in the left eye. The value of the pixel contains the line coordinate of
the corresponding pixel in the right eye image.
Second output (or band): A REAL image containing the sample disparity. The
line/sample coordinate of each pixel in the output defines the position
in the left eye. The value of the pixel contains the sample coordinate of
the corresponding pixel in the right eye image.
If both line & sample disparity values are 0.000 the point has no value
(meaning correlation failed for that point).
Note that all disparity values use 1-based coordinates for the right
image, per VICAR conventions.
Mask file (optional): A BYTE image showing the coverage of tiepoints.
0 dn means the pixel could not be reached in order to be correlated
(i.e. there were no neighbors to supply an initial value, or TPTLIMIT
was reached).
128 dn means a correlation was successfully performed at this location.
255 dn means a correlation was attempted at this location but it failed,
usually because of low correlation quality.
Output quality file (optional): A REAL image containing the correlation
quality of each pixel attempted, from 0 to 1. 0 indicates either a
correlation failure unrelated to the QUALITY setting, or a pixel that was
not attempted due to lack of input disparity.
Output coefficients file (optional): A 6-band REAL image containing the
rest of the perspective transform (except the translation terms). See
OUT_COEF.
Quality of Results Discussion
-----------------------------
A close look at the results shows a certain "bias" in the disparities
away from integer values. This is most easily seen by running F2 to
subtract off the line or sample to get a "raw" disparity:
$R2LIB/f2 disp.img disp_line.img func='"(in1.ne.0)*(in1-line)"' nb=1 sb=1
$R2LIB/f2 disp.img disp_samp.img func='"(in1.ne.0)*(in1-samp)"' nb=1 sb=2
Then run a histogram on the results. The histogram will tend to peak on
half-integer values, with valleys on the integers. This is most obvious
when the degrees of freedom are restricted, say using -amoeba2.
Extensive analysis leads us to believe that this result is caused by
the bilinear interpolation that is performed in the heart of gruen()
(to warp the right image to match the left template). The interpolation
process tends to "smooth out" high-frequency random noise that is typically
in the input images. Because the noise does not match between the right
and left images, the "best" fit tends to be where the noise is least.
Bilinear interpolation has the effect of smoothing out noise when the
interpolation is the highest - in between pixels or offsets of 0.5. At
integral pixel locations, it passes the original data through unchanged,
causing no interpolation smoothing.
Thus the least-squares fit in the correlator is "pulled" slightly away
from the integer values, toward the half-integer. This has been somewhat
verified by correlating the same image against itself, introducing random
noise into the "right" side. As more noise is added, the histogram pulls
away from the integers.
This effect can be mitigated in two ways, in the current version:
1) Use more degrees of freedom. The effect is most pronounced with only 2
degrees of freedom. It is greatly reduced with 4, 5, 6, or 8. The higher
degrees of freedom cause more interpolation to be done (as the transform
does more extensive warping). Of course, the more degrees of freedom, the
slower the algorithm.
2) Use the -filter option to prefilter the images using a low-pass filter.
This reduces the high-frequency noise and reduces the effect (but does not
eliminate it, at least not in all cases). Doing the filter should have
minimal impact on the precision of the results, while improving accuracy
due to the lower noise.
An avenue for future exploration would be to try a method other than bilinear
to do the interpolation, to see if that avoids the integer anti-bias effect.
That would slow down the program tremendously, however.
Parallel Processing
-------------------
This program has been parallelized using Open MP (OMP), which is built in
to the g++ compiler.
By default the number of threads used equals the number of cores on the machine
where the program is being run. Each image line is assigned to a different
core, with "dynamic" scheduling to keep the workload for eeach core similar.
Parallel processing can be disabled via the -OMP_OFF keyword. The number
of threads can be controlled by setting the OMP_NUM_THREADS environment
variable before running the program. There are numerous other OMP variables
that can be set; see the OMP documentation. However, the number of threads
is the only one that is likely to be useful in most cases.
Note that parallel processing necessitated a change in the gore filling
algorithm, but the results should be identical.
Decimation of the left template for processing performances
-----------------------------------------------------------
Depending on the resolution difference (superior than 2) between the left and
right, DECILEFT parameter differs.
In summary:
The TEMPLATE parameter sets the size of left image (the reference image)
patch.
If left is low resolution compared to right, then a larger right patch is
retrieved and automatically "decimated" by the affine transformation to match
the left patch resolution and size.
If, however, the left is high resolution compared to right, then the user has
to specifically ask for similar behavior where a larger left patch is retrieved
and decimated (via DECILEFT). If not asked for, then the actual right patch
size is smaller than TEMPLATE. DECILEFT makes sure that no retrieved patch
(left or right) gets smaller than TEMPLATE.
In details:
The TEMPLATE parameter sets the size of left image (the reference image)
patch. If the left and right images are approximately of the same resolution,
the size of the patch retrieved from the right image will be approximately
identical to the left patch (not accounting for the search area here).
If however the left image is low resolution compared to the right image, the
size of the right patch retrieved from the image will be larger than the left
patch (need a larger patch to cover a similar "area" imaged by the low
resolution left patch). Note that the correlation will still be run on identical
patch size, i.e., of left patch size, but the affine transformation will
automatically decimate the right image. For instance, if left patch is 5x5 and
difference of resolution between left and right is 2 (left has 2 times less
resolution than right image), then a ~10x10 patch from the right image will be
retrieved and the affine transformation will point to every other pixel
to obtain a 5x5 patch from right image. If the left image is high resolution
compared to the right image, the behavior is the same, and the size of the patch
retrieved from the right image is according to the difference of resolution.
For instance if the left image has twice more resolution than the right image,
a 5x5 template (left) patch will correspond to a ~3x3 (2.5x2.5 really, but
increased to next integer) right patch. One can see that large difference of
resolution can be problematic as with increasing difference in resolution,
we get a smaller and smaller right patch, which means that correlation will be
done with fewer and fewer actual pixels from the right image. To alleviate the
problem, the traditional solution is to increase the left window size (template
parameter) such that the resulting right size patch is about the size we
initially wanted for the template. In the previous example that would correspond
in setting the template to a 10x10 size. The problem in increasing the left
patch size is an increase of processing time.
To correct for this unnecessary large patch correlation, DECILEFT keyword
(if ON) allows the decimation of the left patch such that the correlated patches
are not unnecessarily inflated. Again, going back to the previous example, the
template would be set to 5x5; the patch size retrieved from the left image would
be 10x10, then decimated to 5x5. The corresponding right patch would be about
5x5 too. Correlation would be run on 5x5 patches instead of 10x10.
There are two decimation strategies available:
-GLOBAL_DECI: If set, a unique decimation factor is defined and applied to all
the correlation pixels. The decimation value is defined from the ROTATION,
XSCALE and YSCALE parameters. The (unique) decimation factor is applied whether
or not COEFS_IN and/or FEEDBACK are set. In they are set, the affine parameters
of each pixel are adjusted according to the decimation factor.
If ROTATION, XSCALE and YSCALE are either not supplied or if the resulting
affine transformation does not allow for decimation, then decimation of the
left image is turned OFF.
-LOCAL_DECI: If set, the decimation factor is defined for each correlation
pixel. That means that, potentially, the decimation factor may vary accross the
image. Possibly, the decimation factor might also oscillate/jitter from one
pixel to the next if the decimation factor is close to an integer pixel. For
instance, if the difference of resolution (given by the affine parameters) is
close to 3, then one pixel might have a decimation factor of 2.9 (which would
trigger a decimation factor of 2) and the next one a decimation factor of 3.001
(which would trigger a decimation factor of 3).
LOCAL_DECI is only useful if COEFS_IN and/or FEEDBACK are ON, otherwise
each pixel is subject to the same affine transformation, which leads to the same
decimation factor, which is equivalent to GLOBAL_DECI.
Notes:
It is STRONGLY advised (it should be mandatory really) to low pass a high
resolution image that is going to be decimated to match a low resolution image.
This can be done with the FILTER parameter. The filter kernel size should be set
according to the difference in resolution such that the frequency content of the
two images are identical.
DECILEFT should not be run with CHECK ON as the reverse correlation is missing
correct affine transform parameters inversion.
The check on the affine transform parameters to activate the decimation is
done on the a,b,d,and e parameters. The g and h parameters are not checked.
In theory they should be accounted for, but this is more complex as they
introduce an irregular sampling (linearly varying) inside the patch. It is
likely that their values are small enough to be neglected. This could be
investigated further.
HISTORY:
12-1-95 Initial MPFTPT program by J Lorre.
July 99 Signficant internal restructuring and addition of features.
Program renamed to marscorr. Bob Deen
July 03 Creation of marscor3 from marscorr.
Oct 03 Addition of -MULTIPASS, -FILTER, -CHECK, and several other new
features (rgd).
May 16 Parallelization of code (rgd)
May 17 Addition of -DECILEFT and -GLOBAL_DECI (ayoubfra)
COGNIZANT PROGRAMMER: Bob Deen
PARAMETERS:
INP
input images.
first: left eye
second: right eye
OUT
Output disparity
file(s)
IN_DISP
Input disparity
file(s)
MASK
Output mask file
(optional)
OUT_QUALITY
Output quality image
(optional)
IN_COEFS
Input coefficients file
(optional)
OUT_COEFS
Output coefficients file
(optional)
BAND
The vicar image
band number.
Defaults to 1
TEMPLATE
correlation size
SEARCH
correlation search
area
QUALITY
Minimum acceptable
correlation quality
Square of correlation
coefficient.
TPTLIMIT
Limit number of
tiepoints to TPTLIMIT
GORES
Whether or not to
fill in gores
GORE_QUALITY
Minimum correlation
quality for gore passes
GORE_PASSES
Max gore passes (or 0
for unlimited)
GORE_REVERSE
Whether or not to add
reverse gore passes
DISP_PYRAMID
Pyramid level of
input disparities
STOP_PYRAMID
Pyramid level at
which to stop
MODE
Correlation mode.
Use one of the amoeba*
modes
LINEAR
Adds linear search
to any MODE.
FTOL
Accuracy tolerance for
correlation quality
CHECK
Turns on reverse correlation
checking; also acceptable
radius.
CHECK_QUALITY
Threshhold below which
reverse checking is done.
MULTIPASS
Turns on multiple passes
for pyramids > 1
FILTER
Turns on pre-correlation
low-pass filter
FILTER_SIZE
Size of low-pass filter
window. If two values, size
of filter for left and right
sides
PI_SOURCE
Where the primary input
labels come from.
ROTATION
Overall frame rotation.
XSCALE
Overall scale difference,
in the X direction.
YSCALE
Overall scale difference,
in the Y direction.
FEEDBACK_COEFS
Turns on coefficient feedback
between pyramid levels.
ROT_RANGE
Establishes rotation
limits for the transform.
SCALE_RANGE
Establishes scale
limits for the transform.
OMP_ON
Turns on or off parallel
processing (default: on)
DATA_SET_NAME
Specifies the full name given
to a data set or a data product.
DATA_SET_ID
Specifies a unique alphanumeric
identifier for a data set or data
product.
RELEASE_ID
Specifies the unique identifier
associated with the release to the
public of all or part of a data set.
The release number is associated with
the data set, not the mission.
PRODUCT_ID
Specifies a permanent, unique
identifier assigned to a data
product by its producer.
PRODUCER_ID
Specifies the unique identifier
of an entity associated with the
production a data set.
PRODUCER_INST
Specifies the full name of the
identity of an entity associated
with the production of a data set.
TARGET_NAME
Specifies a target.
TARGET_TYPE
Specifies the type of a named target.
DECILEFT
Decimate the left image template if
difference of left and right resolution
allows it.
GLOBAL_DECI
Define a unique (or for each pixel)
decimation factor for entire left image
See Examples:
Cognizant Programmer: