Difference between revisions of "Example Levenberg-Marquardt"

From Efficient Java Matrix Library
Jump to navigation Jump to search
 
(7 intermediate revisions by the same user not shown)
Line 1: Line 1:
Writing succinct and readable linear algebra code in Java, using any library, is problematic.  Originally EJML just offered two API's for performing linear algebraA procedural API which provided complete control over memory and which algorithms were used, but was verbose and has a sharper learning curveAlternatively you could use an object oriented API (SimpleMatrix) but lose control over memory and it has a limited set of operators.  Neither of these API's produces code which is all that similar to how equations are written mathematically.
+
Levenberg-Marquardt (LM) is a popular non-linear optimization algorithm. This example demonstrate how a basic implementation of Levenberg-Marquardt can be created using EJML's [[Procedural|procedural]] interfaceUnnecessary allocation of new memory is avoided by reshaping matricesWhen a matrix is reshaped its width and height is changed but new memory is not declared unless the new shape requires more memory than is available.
  
Languages, such as Matlab, are specifically designed for processing matrices and are much closer to mathematical notation.  C++ offers the ability to overload operators allowing for more natural code, see [http://eigen.tuxfamily.org Eigen]. To overcome this problem EJML now provides the _Equation_ API, which allows a Matlab/Octave like notation to be used.
+
LM works by being provided a function which computes the residual error. Residual error is defined has the difference between the predicted output and the actual observed output, e.g. f(x)-y. Optimization works
 +
by finding a set of parameters which minimize the magnitude of the residuals based on the F2-norm.
  
This is achieved by parsing text strings with equations and converting it into a set of executable instructions, see the usage example below:
+
'''Note:''' This is a simple straight forward implementation of Levenberg-Marquardt and is not as robust as Minpack's implementation.  If you are looking for a robust non-linear least-squares minimization library in Java check out [http://ddogleg.org DDogleg].
  
<syntaxhighlight lang="java">
+
External Resources:
eq.process("K = P*H'*inv( H*P*H' + R )");
+
* [https://github.com/lessthanoptimal/ejml/blob/v0.41/examples/src/org/ejml/example/LevenbergMarquardt.java LevenbergMarquardt.java code]
</syntaxhighlight>
+
* <disqus>Discuss this example</disqus>
  
It is easy to see that the equations code is the compact and readable.  While the syntax is heavily inspired by Matlab and its kin, it does not attempt to replicate their functionality.  It is also not a replacement for SimpleMatrix or the procedural API.  There are situations where those other interfaces are easier to use and most programs would need to use a mix.
+
== Example Code ==
  
Equations is designed to have minimal overheadIt runs almost as fast as the procedural API and can be used such that all memory is predeclared.
+
<syntaxhighlight lang="java">
 +
/**
 +
* <p>
 +
* This is a straight forward implementation of the Levenberg-Marquardt (LM) algorithm. LM is used to minimize
 +
* non-linear cost functions:<br>
 +
* <br>
 +
* S(P) = Sum{ i=1:m , [y<sub>i</sub> - f(x<sub>i</sub>,P)]<sup>2</sup>}<br>
 +
* <br>
 +
* where P is the set of parameters being optimized.
 +
* </p>
 +
*
 +
* <p>
 +
* In each iteration the parameters are updated using the following equations:<br>
 +
* <br>
 +
* P<sub>i+1</sub> = (H + &lambda; I)<sup>-1</sup> d <br>
 +
* d =  (1/N) Sum{ i=1..N , (f(x<sub>i</sub>;P<sub>i</sub>) - y<sub>i</sub>) * jacobian(:,i) } <br>
 +
* H =  (1/N) Sum{ i=1..N , jacobian(:,i) * jacobian(:,i)<sup>T</sup> }
 +
* </p>
 +
  * <p>
 +
* Whenever possible the allocation of new memory is avoided.  This is accomplished by reshaping matrices.
 +
* A matrix that is reshaped won't grow unless the new shape requires more memory than it has available.
 +
* </p>
 +
*
 +
* @author Peter Abeles
 +
*/
 +
public class LevenbergMarquardt {
 +
    // Convergence criteria
 +
    private int maxIterations = 100;
 +
    private double ftol = 1e-12;
 +
    private double gtol = 1e-12;
  
----
+
    // how much the numerical jacobian calculation perturbs the parameters by.
 +
    // In better implementation there are better ways to compute this delta.  See Numerical Recipes.
 +
    private final static double DELTA = 1e-8;
  
__TOC__
+
    // Dampening. Larger values means it's more like gradient descent
 +
    private double initialLambda;
  
= Quick Start =
+
    // the function that is optimized
 +
    private ResidualFunction function;
  
The syntax used in Equation is very similar to Matlab and other computer algebra systems (CAS).  It is assumed the reader is already familiar with these systems and can quickly pick up the syntax through these examples.
+
    // the optimized parameters and associated costs
 +
    private DMatrixRMaj candidateParameters = new DMatrixRMaj(1, 1);
 +
    private double initialCost;
 +
    private double finalCost;
  
Let's start with a complete simple example then explain what's going on line by line.
+
    // used by matrix operations
 +
    private DMatrixRMaj g = new DMatrixRMaj(1, 1);            // gradient
 +
    private DMatrixRMaj H = new DMatrixRMaj(1, 1);            // Hessian approximation
 +
    private DMatrixRMaj Hdiag = new DMatrixRMaj(1, 1);
 +
    private DMatrixRMaj negativeStep = new DMatrixRMaj(1, 1);
  
<pre>
+
    // variables used by the numerical jacobian algorithm
01: public void updateP( DenseMatrix64F P , DenseMatrix64F F , DenseMatrix64F Q ) {
+
     private DMatrixRMaj temp0 = new DMatrixRMaj(1, 1);
02:     Equation eq = new Equation();
+
     private DMatrixRMaj temp1 = new DMatrixRMaj(1, 1);
03:     eq.alias(P,"P",F,"F",Q,"Q");
+
     // used when computing d and H variables
04:     eq.process("S = F*P*F'");
+
     private DMatrixRMaj residuals = new DMatrixRMaj(1, 1);
05:     eq.process("P = S + Q");
 
06: }
 
</pre>
 
  
'''Line 02:''' Declare the Equation class.<br>
+
    // Where the numerical Jacobian is stored.
'''Line 03:''' Create aliases for each variable.  This allowed Equation to reference and manipulate those classes.<br>
+
    private DMatrixRMaj jacobian = new DMatrixRMaj(1, 1);
'''Line 04:''' Process() is called and passed in a text string with an equation in it.  The variable 'S' is lazily created and set to the result of F*P*F'.<br>
 
'''Line 05:''' Process() is called again and P is set to the result of adding S and Q together.  Because P is aliased to the input matrix P that matrix is changed.
 
  
Three types of variables are supported, matrix, double, and integer.  Results can be stored in each type and all can be aliased.  The example below uses all 3 data types and to compute the likelihood of "x" from a multivariable normal distribution defined by matrices 'mu' and 'P'.
+
    public double getInitialCost() {
 +
        return initialCost;
 +
    }
  
<syntaxhighlight lang="java">
+
    public double getFinalCost() {
eq.alias(x.numRows,"k",P,"P",x,"x",mu,"mu");
+
        return finalCost;
eq.process("p = (2*pi)^(-k/2)/sqrt(det(P))*exp(-0.5*(x-mu)'*inv(P)*(x-mu))");
+
    }
</syntaxhighlight>
 
  
The end result 'p' will be a double. There was no need to alias 'pi' since it's a built in constant. Since 'p' is lazily defined how do you access the result?
+
    /**
 +
    * @param initialLambda Initial value of dampening parameter. Try 1 to start
 +
    */
 +
    public LevenbergMarquardt( double initialLambda ) {
 +
        this.initialLambda = initialLambda;
 +
    }
  
<syntaxhighlight lang="java">
+
    /**
double p = eq.lookupDouble("p");
+
    * Specifies convergence criteria
</syntaxhighlight>
+
    *
 +
    * @param maxIterations Maximum number of iterations
 +
    * @param ftol convergence based on change in function value. try 1e-12
 +
    * @param gtol convergence based on residual magnitude. Try 1e-12
 +
    */
 +
    public void setConvergence( int maxIterations, double ftol, double gtol ) {
 +
        this.maxIterations = maxIterations;
 +
        this.ftol = ftol;
 +
        this.gtol = gtol;
 +
    }
  
For a matrix you could use eq.lookupMatrix() and eq.lookupInteger() for integers. If you don't know the variable's type then eq.lookupVariable() is what you need.
+
    /**
 +
    * Finds the best fit parameters.
 +
    *
 +
    * @param function The function being optimized
 +
    * @param parameters (Input/Output) initial parameter estimate and storage for optimized parameters
 +
    * @return true if it succeeded and false if it did not.
 +
    */
 +
    public boolean optimize( ResidualFunction function, DMatrixRMaj parameters ) {
 +
        configure(function, parameters.getNumElements());
  
It is also possible to define a matrix inline:
+
        // save the cost of the initial parameters so that it knows if it improves or not
 +
        double previousCost = initialCost = cost(parameters);
  
<syntaxhighlight lang="java">
+
        // iterate until the difference between the costs is insignificant
eq.process("P = [10 0 0;0 10 0;0 0 10]");
+
        double lambda = initialLambda;
</syntaxhighlight>
 
Will assign P to a 3x3 matrix with 10's all along it's diagonal.  Other matrices can also be included inside:
 
<syntaxhighlight lang="java">
 
eq.process("P = [A ; B]");
 
</syntaxhighlight>
 
will concatenate A and B horizontally.
 
  
Submatrices are also supported for assignment and reference.
+
        // if it should recompute the Jacobian in this iteration or not
<syntaxhighlight lang="java">
+
        boolean computeHessian = true;
eq.process("P(2:5,0:3) = 10*A(1:4,10:13)");
 
</syntaxhighlight>
 
P(2:5,0:3) references the sub-matrix inside of P from rows 2 to 5 (inclusive) and columns 0 to 3 (inclusive).
 
  
This concludes the quick start tutorial.  The remaining sections will go into more detail on each of the subjects touched upon above.
+
        for (int iter = 0; iter < maxIterations; iter++) {
 +
            if (computeHessian) {
 +
                // compute some variables based on the gradient
 +
                computeGradientAndHessian(parameters);
 +
                computeHessian = false;
  
= The Compiler =
+
                // check for convergence using gradient test
 +
                boolean converged = true;
 +
                for (int i = 0; i < g.getNumElements(); i++) {
 +
                    if (Math.abs(g.data[i]) > gtol) {
 +
                        converged = false;
 +
                        break;
 +
                    }
 +
                }
 +
                if (converged)
 +
                    return true;
 +
            }
  
The current compiler is very basic and performs very literal translations of equations into code.  For example, "A = 2.5*B*C'" could be executed with a single call to CommonOps.multTransB(). Instead it will transpose C, save the result, then scale B by 2.5, save the result, multiply the results together, save that, and finally copy the result into A.  In the future the compiler will become smart enough to recognize such patterns.
+
            // H = H + lambda*I
 +
            for (int i = 0; i < H.numRows; i++) {
 +
                H.set(i, i, Hdiag.get(i) + lambda);
 +
            }
  
Compiling the text string contains requires a bit of overhead but once compiled it can be run with very fast.  When dealing with larger matrices the overhead involved is insignificant, but for smaller ones it can have a noticeable impact.  This is why the ability to precompile an equation is provided.
+
            // In robust implementations failure to solve is handled much better
 +
            if (!CommonOps_DDRM.solve(H, g, negativeStep)) {
 +
                return false;
 +
            }
  
Original:
+
            // compute the candidate parameters
<syntaxhighlight lang="java">
+
            CommonOps_DDRM.subtract(parameters, negativeStep, candidateParameters);
eq.process("K = P*H'*inv( H*P*H' + R )");
 
</syntaxhighlight>
 
  
Precompiled:
+
            double cost = cost(candidateParameters);
<syntaxhighlight lang="java">
+
            if (cost <= previousCost) {
// precompile the equation
+
                // the candidate parameters produced better results so use it
Sequence s = eq.compile("K = P*H'*inv( H*P*H' + R )");
+
                computeHessian = true;
// execute the results with out needing to recompile
+
                parameters.setTo(candidateParameters);
s.perform();
 
</syntaxhighlight>
 
  
Both are equivalent, but if an equation is invoked multiple times the precompiled version can have a noticable improvement in performance.  Using precompiled sequences also means that means that internal arrays are only declared once and allows the user to control when memory is created/destroyed.
+
                // check for convergence
 +
                // ftol <= (cost(k) - cost(k+1))/cost(k)
 +
                boolean converged = ftol*previousCost >= previousCost - cost;
  
To make it clear, precompiling is only recommended when dealing with smaller matrices or when tighter control over memory is required.
+
                previousCost = cost;
 +
                lambda /= 10.0;
  
When an equation is precompiled you can still change the alias for a variable.
+
                if (converged) {
<syntaxhighlight lang="java">
+
                    return true;
eq.alias(0,"sum",0,"i");
+
                }
Sequence s = eq.compile("sum = sum + i");
+
            } else {
for( int i = 0; i < 10; i++ ) {
+
                lambda *= 10.0;
    eq.alias(i,"i");
+
            }
     s.perform();
+
        }
}
+
        finalCost = previousCost;
</syntaxhighlight>
+
        return true;
This will sum up the numbers from 0 to 9.
+
     }
  
== Debugging ==
+
    /**
 +
    * Performs sanity checks on the input data and reshapes internal matrices.  By reshaping
 +
    * a matrix it will only declare new memory when needed.
 +
    */
 +
    protected void configure( ResidualFunction function, int numParam ) {
 +
        this.function = function;
 +
        int numFunctions = function.numFunctions();
  
There will be times when you pass in an equation and it throws some weird exception or just doesn't do what you expected. To see the tokens and sequence of operations set the second parameter in compile() or peform() to true.
+
        // reshaping a matrix means that new memory is only declared when needed
 +
        candidateParameters.reshape(numParam, 1);
 +
        g.reshape(numParam, 1);
 +
        H.reshape(numParam, numParam);
 +
        negativeStep.reshape(numParam, 1);
  
For example:
+
        // Normally these variables are thought of as row vectors, but it works out easier if they are column
<syntaxhighlight lang="java">
+
        temp0.reshape(numFunctions, 1);
eq.process("y = z - H*x",true);
+
        temp1.reshape(numFunctions, 1);
</syntaxhighlight>
+
        residuals.reshape(numFunctions, 1);
When application is run it will print out
+
        jacobian.reshape(numFunctions, numParam);
<syntaxhighlight lang="java">
+
    }
Parsed tokens:
 
------------
 
VarMATRIX
 
ASSIGN
 
VarMATRIX
 
MINUS
 
VarMATRIX
 
TIMES
 
VarMATRIX
 
  
Operations:
+
    /**
------------
+
    * Computes the d and H parameters.
multiply-mm
+
    *
subtract-mm
+
    * d = J'*(f(x)-y)    <--- that's also the gradient
copy-mm
+
    * H = J'*J
</syntaxhighlight>
+
    */
 +
    private void computeGradientAndHessian( DMatrixRMaj param ) {
 +
        // residuals = f(x) - y
 +
        function.compute(param, residuals);
  
= Alias =
+
        computeNumericalJacobian(param, jacobian);
  
To manipulate matrices in equations they need to be aliased. Both DenseMatrix64F and SimpleMatrix can be aliased.  A copy of scalar numbers can also be aliased.  When a variable is aliased a reference to the data is saved and a name associated with it.
+
        CommonOps_DDRM.multTransA(jacobian, residuals, g);
<syntaxhighlight lang="java">
+
        CommonOps_DDRM.multTransA(jacobian, jacobian, H);
DenseMatrix64F x = new DenseMatrix64F(6,1);
 
eq.alias(x,"x");
 
</syntaxhighlight>
 
Multiple variables can be aliased at the same time too
 
<syntaxhighlight lang="java">
 
eq.alias(x,"x",P,"P",h,"Happy");
 
</syntaxhighlight>
 
As is shown above the string name for a variable does not have to be the same as Java name of the variable.  Here is an example where an integer and double is aliased.
 
<syntaxhighlight lang="java">
 
int a = 6;
 
eq.alias(2.3,"distance",a,"a");
 
</syntaxhighlight>
 
  
After a variable has been aliased you can alias the same name again to change it. Here is an example of just that:
+
        CommonOps_DDRM.extractDiag(H, Hdiag);
<syntaxhighlight lang="java">
+
     }
for( int i = 0; i < 10; i++ ) {
 
    eq.alias(i,"i");
 
     // do stuff with i
 
}
 
</syntaxhighlight>
 
  
If after benchmarking your code and discover that the alias operation is slowing it down (a hashmap lookup is done internally) then you should consider the following faster, but uglier, alternative.
+
    /**
<syntaxhighlight lang="java">
+
    * Computes the "cost" for the parameters given.
VariableInteger i = eq.lookupVariable("i");
+
    *
for( i.value = 0; i.value < 10; i.value++ ) {
+
    * cost = (1/N) Sum (f(x) - y)^2
    // do stuff with i
+
    */
}
+
    private double cost( DMatrixRMaj param ) {
</syntaxhighlight>
+
        function.compute(param, residuals);
  
= Submatrices =
+
        double error = NormOps_DDRM.normF(residuals);
  
Sub-matrices can be read from and written to.  It's easy to reference a sub-matrix inside of any matrix. A few examples are below.
+
        return error*error/(double)residuals.numRows;
<syntaxhighlight lang="java">
+
    }
A(1:4,0:5)
 
</syntaxhighlight>
 
Here rows 1 to 4 (inclusive) and columns 0 to 5 (inclusive) compose the sub-matrix of A.  The notation "a:b" indicates an integer set from 'a' to 'b', where 'a' and 'b' must be integers themselves.  To specify every row or column do ":" or all rows and columns past a certain 'a' can be referenced with "a:".  Finally, you can reference just a number by typeing it, e.g. "a".
 
<syntaxhighlight lang="java">
 
A(3:,3)  <-- Rows from 3 to the last row and just column 3
 
A(:,:)  <-- Every element in A
 
A(1,2)  <-- The element in A at row=1,col=2
 
</syntaxhighlight>
 
The last example is a special case in that A(1,2) will return a double and not 1x1 matrix. Consider the following:
 
<syntaxhighlight lang="java">
 
A(0:2,0:2) = C/B(1,2)
 
</syntaxhighlight>
 
The results of dividing the elements of matrix C by the value of B(1,2) is assigned to the submatrix in A.
 
  
A named variable can also be used to reference elements as long as it's an integer.
+
    /**
<syntaxhighlight lang="java">
+
    * Computes a simple numerical Jacobian.
a = A(i,j)
+
    *
</syntaxhighlight>
+
    * @param param (input) The set of parameters that the Jacobian is to be computed at.
 +
    * @param jacobian (output) Where the jacobian will be stored
 +
    */
 +
    protected void computeNumericalJacobian( DMatrixRMaj param,
 +
                                            DMatrixRMaj jacobian ) {
 +
        double invDelta = 1.0/DELTA;
  
= Inline Matrix =
+
        function.compute(param, temp0);
  
Matrices can be created inline and are defined inside of brackets. The matrix is specified in a row-major format, where a space separates elements in a row and a semi-colon indicates the end of a row.
+
        // compute the jacobian by perturbing the parameters slightly
<syntaxhighlight lang="java">
+
        // then seeing how it effects the results.
[5 0 0;0 4.0 0.0 ; 0 0 1]
+
        for (int i = 0; i < param.getNumElements(); i++) {
</syntaxhighlight>
+
            param.data[i] += DELTA;
Defines a 3x3 matrix with 5,4,1 for it's diagonal elements. Visually this looks like:
+
            function.compute(param, temp1);
<syntaxhighlight lang="java">
+
            // compute the difference between the two parameters and divide by the delta
[ 5 0 0 ]
+
            // temp1 = (temp1 - temp0)/delta
[ 0 4 0 ]
+
            CommonOps_DDRM.add(invDelta, temp1, -invDelta, temp0, temp1);
[ 0 0 1 ]
 
</syntaxhighlight>
 
An inline matrix can be used to concatenate other matrices together.
 
<syntaxhighlight lang="java">
 
[ A ; B ; C ]
 
</syntaxhighlight>
 
Will concatenate matrices A, B, and C along their rows.  They must have the same number of columns.  As you might guess, to concatenate along columns you would
 
<syntaxhighlight lang="java">
 
[ A B C ]
 
</syntaxhighlight>
 
and each matrix must have the same number of rows. Inner matrices are also allowed
 
<syntaxhighlight lang="java">
 
[ [1 2;2 3] [4;5] ; A ]
 
</syntaxhighlight>
 
which will result in
 
<syntaxhighlight lang="java">
 
[ 1 2 4 ]
 
[ 2 3 5 ]
 
[  A  ]
 
</syntaxhighlight>
 
  
= Built in Functions and Variables =
+
            // copy the results into the jacobian matrix
 +
            // J(i,:) = temp1
 +
            CommonOps_DDRM.insert(temp1, jacobian, 0, i);
  
'''Constants'''
+
            param.data[i] -= DELTA;
<pre>
+
        }
pi = Math.PI
+
    }
e  = Math.E
 
</pre>
 
  
'''Functions'''
+
    /**
 +
    * The function that is being optimized. Returns the residual. f(x) - y
 +
    */
 +
    public interface ResidualFunction {
 +
        /**
 +
        * Computes the residual vector given the set of input parameters
 +
        * Function which goes from N input to M outputs
 +
        *
 +
        * @param param (Input) N by 1 parameter vector
 +
        * @param residual (Output) M by 1 output vector to store the residual = f(x)-y
 +
        */
 +
        void compute( DMatrixRMaj param, DMatrixRMaj residual );
  
<pre>
+
        /**
eye(N)      Create an identity matrix which is N by N.
+
        * Number of functions in output
eye(A)      Create an identity matrix which is A.numRows by A.numCols
+
        *
normF(A)    Frobenius normal of the matrix.
+
        * @return function count
det(A)      Determinant of the matrix
+
        */
inv(A)      Inverse of a matrix
+
        int numFunctions();
pinv(A)      Pseudo-inverse of a matrix
+
     }
rref(A)      Reduced row echelon form of A
+
}
trace(A)    Trace of the matrix
 
zeros(r,c)  Matrix full of zeros with r rows and c columns.
 
ones(r,c)    Matrix full of ones with r rows and c columns.
 
diag(A)      If a vector then returns a square matrix with diagonal elements filled with vector
 
diag(A)      If a matrix then it returns the diagonal elements as a column vector
 
dot(A,B)    Returns the dot product of two vectors as a double.  Does not work on general matrices.
 
solve(A,B)  Returns the solution X from A*X = B.
 
kron(A,B)   Kronecker product
 
abs(A)      Absolute value of A.
 
max(A)      Element with the largest value in A.
 
min(A)      Element with the smallest value in A.
 
pow(a,b)     Scalar power of a to b.  Can also be invoked with "a^b".
 
sin(a)      Math.sin(a) for scalars only
 
cos(a)      Math.cos(a) for scalars only
 
atan(a)      Math.atan(a) for scalars only
 
atan2(a,b)  Math.atan2(a,b) for scalars only
 
exp(a)      Math.exp(a) for scalars and element-wise matrices
 
log(a)      Math.log(a) for scalars and element-wise matrices
 
 
</syntaxhighlight>
 
</syntaxhighlight>
 
'''Symbols'''
 
 
<syntaxhighlight lang="java">
 
'*'        multiplication (Matrix-Matrix, Scalar-Matrix, Scalar-Scalar)
 
'+'        addition (Matrix-Matrix, Scalar-Matrix, Scalar-Scalar)
 
'-'        subtraction (Matrix-Matrix, Scalar-Matrix, Scalar-Scalar)
 
'/'        divide (Matrix-Scalar, Scalar-Scalar)
 
'/'        matrix solve "x=b/A" is equivalent to x=solve(A,b) (Matrix-Matrix)
 
'^'        Scalar power.  a^b is a to the power of b.
 
'\'        left-divide.  Same as divide but reversed.  e.g. x=A\b is x=solve(A,b)
 
'.*'      element-wise multiplication (Matrix-Matrix)
 
'./'      element-wise division (Matrix-Matrix)
 
'.^'      element-wise power. (scalar-scalar) (matrix-matrix) (scalar-matrix) (matrix-scalar)
 
'^'        Scalar power.  a^b is a to the power of b.
 
'''        matrix transpose
 
'='        assignment by value (Matrix-Matrix, Scalar-Scalar)
 
</pre>
 
 
= User Defined Functions =
 
 
It's easy to add your own custom functions too.  A custom function implements ManagerFunctions.Input1 or ManagerFunctions.InputN, depending on the number of inputs it takes.  It is then added to the ManagerFunctions in Equation by call add().  The output matrix should also be resized.
 
 
[https://github.com/lessthanoptimal/ejml/blob/equation/examples/src/org/ejml/example/EquationCustomFunction.java Custom Function Example]
 

Latest revision as of 07:26, 7 July 2021

Levenberg-Marquardt (LM) is a popular non-linear optimization algorithm. This example demonstrate how a basic implementation of Levenberg-Marquardt can be created using EJML's procedural interface. Unnecessary allocation of new memory is avoided by reshaping matrices. When a matrix is reshaped its width and height is changed but new memory is not declared unless the new shape requires more memory than is available.

LM works by being provided a function which computes the residual error. Residual error is defined has the difference between the predicted output and the actual observed output, e.g. f(x)-y. Optimization works by finding a set of parameters which minimize the magnitude of the residuals based on the F2-norm.

Note: This is a simple straight forward implementation of Levenberg-Marquardt and is not as robust as Minpack's implementation. If you are looking for a robust non-linear least-squares minimization library in Java check out DDogleg.

External Resources:

Example Code

/**
 * <p>
 * This is a straight forward implementation of the Levenberg-Marquardt (LM) algorithm. LM is used to minimize
 * non-linear cost functions:<br>
 * <br>
 * S(P) = Sum{ i=1:m , [y<sub>i</sub> - f(x<sub>i</sub>,P)]<sup>2</sup>}<br>
 * <br>
 * where P is the set of parameters being optimized.
 * </p>
 *
 * <p>
 * In each iteration the parameters are updated using the following equations:<br>
 * <br>
 * P<sub>i+1</sub> = (H + &lambda; I)<sup>-1</sup> d <br>
 * d =  (1/N) Sum{ i=1..N , (f(x<sub>i</sub>;P<sub>i</sub>) - y<sub>i</sub>) * jacobian(:,i) } <br>
 * H =  (1/N) Sum{ i=1..N , jacobian(:,i) * jacobian(:,i)<sup>T</sup> }
 * </p>
 * <p>
 * Whenever possible the allocation of new memory is avoided.  This is accomplished by reshaping matrices.
 * A matrix that is reshaped won't grow unless the new shape requires more memory than it has available.
 * </p>
 *
 * @author Peter Abeles
 */
public class LevenbergMarquardt {
    // Convergence criteria
    private int maxIterations = 100;
    private double ftol = 1e-12;
    private double gtol = 1e-12;

    // how much the numerical jacobian calculation perturbs the parameters by.
    // In better implementation there are better ways to compute this delta.  See Numerical Recipes.
    private final static double DELTA = 1e-8;

    // Dampening. Larger values means it's more like gradient descent
    private double initialLambda;

    // the function that is optimized
    private ResidualFunction function;

    // the optimized parameters and associated costs
    private DMatrixRMaj candidateParameters = new DMatrixRMaj(1, 1);
    private double initialCost;
    private double finalCost;

    // used by matrix operations
    private DMatrixRMaj g = new DMatrixRMaj(1, 1);            // gradient
    private DMatrixRMaj H = new DMatrixRMaj(1, 1);            // Hessian approximation
    private DMatrixRMaj Hdiag = new DMatrixRMaj(1, 1);
    private DMatrixRMaj negativeStep = new DMatrixRMaj(1, 1);

    // variables used by the numerical jacobian algorithm
    private DMatrixRMaj temp0 = new DMatrixRMaj(1, 1);
    private DMatrixRMaj temp1 = new DMatrixRMaj(1, 1);
    // used when computing d and H variables
    private DMatrixRMaj residuals = new DMatrixRMaj(1, 1);

    // Where the numerical Jacobian is stored.
    private DMatrixRMaj jacobian = new DMatrixRMaj(1, 1);

    public double getInitialCost() {
        return initialCost;
    }

    public double getFinalCost() {
        return finalCost;
    }

    /**
     * @param initialLambda Initial value of dampening parameter. Try 1 to start
     */
    public LevenbergMarquardt( double initialLambda ) {
        this.initialLambda = initialLambda;
    }

    /**
     * Specifies convergence criteria
     *
     * @param maxIterations Maximum number of iterations
     * @param ftol convergence based on change in function value. try 1e-12
     * @param gtol convergence based on residual magnitude. Try 1e-12
     */
    public void setConvergence( int maxIterations, double ftol, double gtol ) {
        this.maxIterations = maxIterations;
        this.ftol = ftol;
        this.gtol = gtol;
    }

    /**
     * Finds the best fit parameters.
     *
     * @param function The function being optimized
     * @param parameters (Input/Output) initial parameter estimate and storage for optimized parameters
     * @return true if it succeeded and false if it did not.
     */
    public boolean optimize( ResidualFunction function, DMatrixRMaj parameters ) {
        configure(function, parameters.getNumElements());

        // save the cost of the initial parameters so that it knows if it improves or not
        double previousCost = initialCost = cost(parameters);

        // iterate until the difference between the costs is insignificant
        double lambda = initialLambda;

        // if it should recompute the Jacobian in this iteration or not
        boolean computeHessian = true;

        for (int iter = 0; iter < maxIterations; iter++) {
            if (computeHessian) {
                // compute some variables based on the gradient
                computeGradientAndHessian(parameters);
                computeHessian = false;

                // check for convergence using gradient test
                boolean converged = true;
                for (int i = 0; i < g.getNumElements(); i++) {
                    if (Math.abs(g.data[i]) > gtol) {
                        converged = false;
                        break;
                    }
                }
                if (converged)
                    return true;
            }

            // H = H + lambda*I
            for (int i = 0; i < H.numRows; i++) {
                H.set(i, i, Hdiag.get(i) + lambda);
            }

            // In robust implementations failure to solve is handled much better
            if (!CommonOps_DDRM.solve(H, g, negativeStep)) {
                return false;
            }

            // compute the candidate parameters
            CommonOps_DDRM.subtract(parameters, negativeStep, candidateParameters);

            double cost = cost(candidateParameters);
            if (cost <= previousCost) {
                // the candidate parameters produced better results so use it
                computeHessian = true;
                parameters.setTo(candidateParameters);

                // check for convergence
                // ftol <= (cost(k) - cost(k+1))/cost(k)
                boolean converged = ftol*previousCost >= previousCost - cost;

                previousCost = cost;
                lambda /= 10.0;

                if (converged) {
                    return true;
                }
            } else {
                lambda *= 10.0;
            }
        }
        finalCost = previousCost;
        return true;
    }

    /**
     * Performs sanity checks on the input data and reshapes internal matrices.  By reshaping
     * a matrix it will only declare new memory when needed.
     */
    protected void configure( ResidualFunction function, int numParam ) {
        this.function = function;
        int numFunctions = function.numFunctions();

        // reshaping a matrix means that new memory is only declared when needed
        candidateParameters.reshape(numParam, 1);
        g.reshape(numParam, 1);
        H.reshape(numParam, numParam);
        negativeStep.reshape(numParam, 1);

        // Normally these variables are thought of as row vectors, but it works out easier if they are column
        temp0.reshape(numFunctions, 1);
        temp1.reshape(numFunctions, 1);
        residuals.reshape(numFunctions, 1);
        jacobian.reshape(numFunctions, numParam);
    }

    /**
     * Computes the d and H parameters.
     *
     * d = J'*(f(x)-y)    <--- that's also the gradient
     * H = J'*J
     */
    private void computeGradientAndHessian( DMatrixRMaj param ) {
        // residuals = f(x) - y
        function.compute(param, residuals);

        computeNumericalJacobian(param, jacobian);

        CommonOps_DDRM.multTransA(jacobian, residuals, g);
        CommonOps_DDRM.multTransA(jacobian, jacobian, H);

        CommonOps_DDRM.extractDiag(H, Hdiag);
    }

    /**
     * Computes the "cost" for the parameters given.
     *
     * cost = (1/N) Sum (f(x) - y)^2
     */
    private double cost( DMatrixRMaj param ) {
        function.compute(param, residuals);

        double error = NormOps_DDRM.normF(residuals);

        return error*error/(double)residuals.numRows;
    }

    /**
     * Computes a simple numerical Jacobian.
     *
     * @param param (input) The set of parameters that the Jacobian is to be computed at.
     * @param jacobian (output) Where the jacobian will be stored
     */
    protected void computeNumericalJacobian( DMatrixRMaj param,
                                             DMatrixRMaj jacobian ) {
        double invDelta = 1.0/DELTA;

        function.compute(param, temp0);

        // compute the jacobian by perturbing the parameters slightly
        // then seeing how it effects the results.
        for (int i = 0; i < param.getNumElements(); i++) {
            param.data[i] += DELTA;
            function.compute(param, temp1);
            // compute the difference between the two parameters and divide by the delta
            // temp1 = (temp1 - temp0)/delta
            CommonOps_DDRM.add(invDelta, temp1, -invDelta, temp0, temp1);

            // copy the results into the jacobian matrix
            // J(i,:) = temp1
            CommonOps_DDRM.insert(temp1, jacobian, 0, i);

            param.data[i] -= DELTA;
        }
    }

    /**
     * The function that is being optimized. Returns the residual. f(x) - y
     */
    public interface ResidualFunction {
        /**
         * Computes the residual vector given the set of input parameters
         * Function which goes from N input to M outputs
         *
         * @param param (Input) N by 1 parameter vector
         * @param residual (Output) M by 1 output vector to store the residual = f(x)-y
         */
        void compute( DMatrixRMaj param, DMatrixRMaj residual );

        /**
         * Number of functions in output
         *
         * @return function count
         */
        int numFunctions();
    }
}