Difference between revisions of "Example Levenberg-Marquardt"

From Efficient Java Matrix Library
Jump to navigation Jump to search
 
(6 intermediate revisions by the same user not shown)
Line 1: Line 1:
Writing succinct and readable linear algebra code in Java, using any library, is problematic.  Originally EJML just offered two API's for performing linear algebraA procedural API which provided complete control over memory and which algorithms were used, but was verbose and has a sharper learning curveAlternatively you could use an object oriented API (SimpleMatrix) but lose control over memory and it has a limited set of operators.  Neither of these API's produces code which is all that similar to how equations are written mathematically.
+
Levenberg-Marquardt (LM) is a popular non-linear optimization algorithm. This example demonstrate how a basic implementation of Levenberg-Marquardt can be created using EJML's [[Procedural|procedural]] interfaceUnnecessary allocation of new memory is avoided by reshaping matricesWhen a matrix is reshaped its width and height is changed but new memory is not declared unless the new shape requires more memory than is available.
  
Languages, such as Matlab, are specifically designed for processing matrices and are much closer to mathematical notation.  C++ offers the ability to overload operators allowing for more natural code, see [http://eigen.tuxfamily.org Eigen]. To overcome this problem EJML now provides the _Equation_ API, which allows a Matlab/Octave like notation to be used.
+
LM works by being provided a function which computes the residual error. Residual error is defined has the difference between the predicted output and the actual observed output, e.g. f(x)-y. Optimization works
 +
by finding a set of parameters which minimize the magnitude of the residuals based on the F2-norm.
  
This is achieved by parsing text strings with equations and converting it into a set of executable instructions, see the usage example below:
+
'''Note:''' This is a simple straight forward implementation of Levenberg-Marquardt and is not as robust as Minpack's implementation.  If you are looking for a robust non-linear least-squares minimization library in Java check out [http://ddogleg.org DDogleg].
  
<syntaxhighlight lang="java">
+
External Resources:
eq.process("K = P*H'*inv( H*P*H' + R )");
+
* [https://github.com/lessthanoptimal/ejml/blob/v0.41/examples/src/org/ejml/example/LevenbergMarquardt.java LevenbergMarquardt.java code]
</syntaxhighlight>
+
* <disqus>Discuss this example</disqus>
  
It is easy to see that the equations code is the compact and readable.  While the syntax is heavily inspired by Matlab and its kin, it does not attempt to replicate their functionality.  It is also not a replacement for SimpleMatrix or the procedural API.  There are situations where those other interfaces are easier to use and most programs would need to use a mix.
+
== Example Code ==
  
Equations is designed to have minimal overheadIt runs almost as fast as the procedural API and can be used such that all memory is predeclared.
+
<syntaxhighlight lang="java">
 
+
/**
----
+
* <p>
 
+
* This is a straight forward implementation of the Levenberg-Marquardt (LM) algorithm. LM is used to minimize
__TOC__
+
* non-linear cost functions:<br>
 +
* <br>
 +
* S(P) = Sum{ i=1:m , [y<sub>i</sub> - f(x<sub>i</sub>,P)]<sup>2</sup>}<br>
 +
* <br>
 +
* where P is the set of parameters being optimized.
 +
* </p>
 +
*
 +
* <p>
 +
* In each iteration the parameters are updated using the following equations:<br>
 +
* <br>
 +
* P<sub>i+1</sub> = (H + &lambda; I)<sup>-1</sup> d <br>
 +
* d =  (1/N) Sum{ i=1..N , (f(x<sub>i</sub>;P<sub>i</sub>) - y<sub>i</sub>) * jacobian(:,i) } <br>
 +
  * H =  (1/N) Sum{ i=1..N , jacobian(:,i) * jacobian(:,i)<sup>T</sup> }
 +
* </p>
 +
* <p>
 +
* Whenever possible the allocation of new memory is avoided.  This is accomplished by reshaping matrices.
 +
* A matrix that is reshaped won't grow unless the new shape requires more memory than it has available.
 +
* </p>
 +
*
 +
* @author Peter Abeles
 +
*/
 +
public class LevenbergMarquardt {
 +
    // Convergence criteria
 +
    private int maxIterations = 100;
 +
    private double ftol = 1e-12;
 +
    private double gtol = 1e-12;
  
= Quick Start =
+
    // how much the numerical jacobian calculation perturbs the parameters by.
 +
    // In better implementation there are better ways to compute this delta.  See Numerical Recipes.
 +
    private final static double DELTA = 1e-8;
  
The syntax used in Equation is very similar to Matlab and other computer algebra systems (CAS).  It is assumed the reader is already familiar with these systems and can quickly pick up the syntax through these examples.
+
    // Dampening. Larger values means it's more like gradient descent
 +
    private double initialLambda;
  
Let's start with a complete simple example then explain what's going on line by line.
+
    // the function that is optimized
 +
    private ResidualFunction function;
  
<pre>
+
    // the optimized parameters and associated costs
01: public void updateP( DenseMatrix64F P , DenseMatrix64F F , DenseMatrix64F Q ) {
+
     private DMatrixRMaj candidateParameters = new DMatrixRMaj(1, 1);
02:     Equation eq = new Equation();
+
     private double initialCost;
03:    eq.alias(P,"P",F,"F",Q,"Q");
+
     private double finalCost;
04:     eq.process("S = F*P*F'");
 
05:     eq.process("P = S + Q");
 
06: }
 
</pre>
 
  
'''Line 02:''' Declare the Equation class.<br>
+
    // used by matrix operations
'''Line 03:''' Create aliases for each variable.  This allowed Equation to reference and manipulate those classes.<br>
+
    private DMatrixRMaj g = new DMatrixRMaj(1, 1);            // gradient
'''Line 04:''' Process() is called and passed in a text string with an equation in it.  The variable 'S' is lazily created and set to the result of F*P*F'.<br>
+
    private DMatrixRMaj H = new DMatrixRMaj(1, 1);            // Hessian approximation
'''Line 05:''' Process() is called again and P is set to the result of adding S and Q together.  Because P is aliased to the input matrix P that matrix is changed.
+
    private DMatrixRMaj Hdiag = new DMatrixRMaj(1, 1);
 +
    private DMatrixRMaj negativeStep = new DMatrixRMaj(1, 1);
  
Three types of variables are supported, matrix, double, and integer.  Results can be stored in each type and all can be aliased.  The example below uses all 3 data types and to compute the likelihood of "x" from a multivariable normal distribution defined by matrices 'mu' and 'P'.
+
    // variables used by the numerical jacobian algorithm
 +
    private DMatrixRMaj temp0 = new DMatrixRMaj(1, 1);
 +
    private DMatrixRMaj temp1 = new DMatrixRMaj(1, 1);
 +
    // used when computing d and H variables
 +
    private DMatrixRMaj residuals = new DMatrixRMaj(1, 1);
  
<syntaxhighlight lang="java">
+
    // Where the numerical Jacobian is stored.
eq.alias(x.numRows,"k",P,"P",x,"x",mu,"mu");
+
    private DMatrixRMaj jacobian = new DMatrixRMaj(1, 1);
eq.process("p = (2*pi)^(-k/2)/sqrt(det(P))*exp(-0.5*(x-mu)'*inv(P)*(x-mu))");
 
</syntaxhighlight>
 
  
The end result 'p' will be a double.  There was no need to alias 'pi' since it's a built in constant.  Since 'p' is lazily defined how do you access the result?
+
    public double getInitialCost() {
 +
        return initialCost;
 +
    }
  
<syntaxhighlight lang="java">
+
    public double getFinalCost() {
double p = eq.lookupDouble("p");
+
        return finalCost;
</syntaxhighlight>
+
    }
  
For a matrix you could use eq.lookupMatrix() and eq.lookupInteger() for integers.  If you don't know the variable's type then eq.lookupVariable() is what you need.
+
    /**
 +
    * @param initialLambda Initial value of dampening parameter. Try 1 to start
 +
    */
 +
    public LevenbergMarquardt( double initialLambda ) {
 +
        this.initialLambda = initialLambda;
 +
    }
  
It is also possible to define a matrix inline:
+
    /**
 +
    * Specifies convergence criteria
 +
    *
 +
    * @param maxIterations Maximum number of iterations
 +
    * @param ftol convergence based on change in function value. try 1e-12
 +
    * @param gtol convergence based on residual magnitude. Try 1e-12
 +
    */
 +
    public void setConvergence( int maxIterations, double ftol, double gtol ) {
 +
        this.maxIterations = maxIterations;
 +
        this.ftol = ftol;
 +
        this.gtol = gtol;
 +
    }
  
<syntaxhighlight lang="java">
+
    /**
eq.process("P = [10 0 0;0 10 0;0 0 10]");
+
    * Finds the best fit parameters.
</syntaxhighlight>
+
    *
Will assign P to a 3x3 matrix with 10's all along it's diagonal. Other matrices can also be included inside:
+
    * @param function The function being optimized
<syntaxhighlight lang="java">
+
    * @param parameters (Input/Output) initial parameter estimate and storage for optimized parameters
eq.process("P = [A ; B]");
+
    * @return true if it succeeded and false if it did not.
</syntaxhighlight>
+
    */
will concatenate A and B horizontally.
+
    public boolean optimize( ResidualFunction function, DMatrixRMaj parameters ) {
 +
        configure(function, parameters.getNumElements());
  
Submatrices are also supported for assignment and reference.
+
        // save the cost of the initial parameters so that it knows if it improves or not
<syntaxhighlight lang="java">
+
        double previousCost = initialCost = cost(parameters);
eq.process("P(2:5,0:3) = 10*A(1:4,10:13)");
 
</syntaxhighlight>
 
P(2:5,0:3) references the sub-matrix inside of P from rows 2 to 5 (inclusive) and columns 0 to 3 (inclusive).
 
  
This concludes the quick start tutorial.  The remaining sections will go into more detail on each of the subjects touched upon above.
+
        // iterate until the difference between the costs is insignificant
 +
        double lambda = initialLambda;
  
= The Compiler =
+
        // if it should recompute the Jacobian in this iteration or not
 +
        boolean computeHessian = true;
  
The current compiler is very basic and performs very literal translations of equations into code.  For example, "A = 2.5*B*C'" could be executed with a single call to CommonOps.multTransB().  Instead it will transpose C, save the result, then scale B by 2.5, save the result, multiply the results together, save that, and finally copy the result into A.  In the future the compiler will become smart enough to recognize such patterns.
+
        for (int iter = 0; iter < maxIterations; iter++) {
 +
            if (computeHessian) {
 +
                // compute some variables based on the gradient
 +
                computeGradientAndHessian(parameters);
 +
                computeHessian = false;
  
Compiling the text string contains requires a bit of overhead but once compiled it can be run with very fast. When dealing with larger matrices the overhead involved is insignificant, but for smaller ones it can have a noticeable impact. This is why the ability to precompile an equation is provided.
+
                // check for convergence using gradient test
 +
                boolean converged = true;
 +
                for (int i = 0; i < g.getNumElements(); i++) {
 +
                    if (Math.abs(g.data[i]) > gtol) {
 +
                        converged = false;
 +
                        break;
 +
                    }
 +
                }
 +
                if (converged)
 +
                    return true;
 +
            }
  
Original:
+
            // H = H + lambda*I
<syntaxhighlight lang="java">
+
            for (int i = 0; i < H.numRows; i++) {
eq.process("K = P*H'*inv( H*P*H' + R )");
+
                H.set(i, i, Hdiag.get(i) + lambda);
</syntaxhighlight>
+
            }
  
Precompiled:
+
            // In robust implementations failure to solve is handled much better
<syntaxhighlight lang="java">
+
            if (!CommonOps_DDRM.solve(H, g, negativeStep)) {
// precompile the equation
+
                return false;
Sequence s = eq.compile("K = P*H'*inv( H*P*H' + R )");
+
            }
// execute the results with out needing to recompile
 
s.perform();
 
</syntaxhighlight>
 
  
Both are equivalent, but if an equation is invoked multiple times the precompiled version can have a noticable improvement in performance.  Using precompiled sequences also means that means that internal arrays are only declared once and allows the user to control when memory is created/destroyed.
+
            // compute the candidate parameters
 +
            CommonOps_DDRM.subtract(parameters, negativeStep, candidateParameters);
  
To make it clear, precompiling is only recommended when dealing with smaller matrices or when tighter control over memory is required.
+
            double cost = cost(candidateParameters);
 +
            if (cost <= previousCost) {
 +
                // the candidate parameters produced better results so use it
 +
                computeHessian = true;
 +
                parameters.setTo(candidateParameters);
  
When an equation is precompiled you can still change the alias for a variable.
+
                // check for convergence
<syntaxhighlight lang="java">
+
                // ftol <= (cost(k) - cost(k+1))/cost(k)
eq.alias(0,"sum",0,"i");
+
                boolean converged = ftol*previousCost >= previousCost - cost;
Sequence s = eq.compile("sum = sum + i");
 
for( int i = 0; i < 10; i++ ) {
 
    eq.alias(i,"i");
 
    s.perform();
 
}
 
</syntaxhighlight>
 
This will sum up the numbers from 0 to 9.
 
  
== Debugging ==
+
                previousCost = cost;
 +
                lambda /= 10.0;
  
There will be times when you pass in an equation and it throws some weird exception or just doesn't do what you expected.  To see the tokens and sequence of operations set the second parameter in compile() or peform() to true.
+
                if (converged) {
 +
                    return true;
 +
                }
 +
            } else {
 +
                lambda *= 10.0;
 +
            }
 +
        }
 +
        finalCost = previousCost;
 +
        return true;
 +
    }
  
For example:
+
    /**
<syntaxhighlight lang="java">
+
    * Performs sanity checks on the input data and reshapes internal matrices.  By reshaping
eq.process("y = z - H*x",true);
+
    * a matrix it will only declare new memory when needed.
</syntaxhighlight>
+
    */
When application is run it will print out
+
    protected void configure( ResidualFunction function, int numParam ) {
<syntaxhighlight lang="java">
+
        this.function = function;
Parsed tokens:
+
        int numFunctions = function.numFunctions();
------------
 
VarMATRIX
 
ASSIGN
 
VarMATRIX
 
MINUS
 
VarMATRIX
 
TIMES
 
VarMATRIX
 
  
Operations:
+
        // reshaping a matrix means that new memory is only declared when needed
------------
+
        candidateParameters.reshape(numParam, 1);
multiply-mm
+
        g.reshape(numParam, 1);
subtract-mm
+
        H.reshape(numParam, numParam);
copy-mm
+
        negativeStep.reshape(numParam, 1);
</syntaxhighlight>
 
  
= Alias =
+
        // Normally these variables are thought of as row vectors, but it works out easier if they are column
 +
        temp0.reshape(numFunctions, 1);
 +
        temp1.reshape(numFunctions, 1);
 +
        residuals.reshape(numFunctions, 1);
 +
        jacobian.reshape(numFunctions, numParam);
 +
    }
  
To manipulate matrices in equations they need to be aliased.  Both DenseMatrix64F and SimpleMatrix can be aliased.  A copy of scalar numbers can also be aliased.  When a variable is aliased a reference to the data is saved and a name associated with it.
+
    /**
<syntaxhighlight lang="java">
+
    * Computes the d and H parameters.
DenseMatrix64F x = new DenseMatrix64F(6,1);
+
    *
eq.alias(x,"x");
+
    * d = J'*(f(x)-y)    <--- that's also the gradient
</syntaxhighlight>
+
    * H = J'*J
Multiple variables can be aliased at the same time too
+
    */
<syntaxhighlight lang="java">
+
    private void computeGradientAndHessian( DMatrixRMaj param ) {
eq.alias(x,"x",P,"P",h,"Happy");
+
        // residuals = f(x) - y
</syntaxhighlight>
+
        function.compute(param, residuals);
As is shown above the string name for a variable does not have to be the same as Java name of the variable.  Here is an example where an integer and double is aliased.
 
<syntaxhighlight lang="java">
 
int a = 6;
 
eq.alias(2.3,"distance",a,"a");
 
</syntaxhighlight>
 
  
After a variable has been aliased you can alias the same name again to change it.  Here is an example of just that:
+
        computeNumericalJacobian(param, jacobian);
<syntaxhighlight lang="java">
 
for( int i = 0; i < 10; i++ ) {
 
    eq.alias(i,"i");
 
    // do stuff with i
 
}
 
</syntaxhighlight>
 
  
If after benchmarking your code and discover that the alias operation is slowing it down (a hashmap lookup is done internally) then you should consider the following faster, but uglier, alternative.
+
        CommonOps_DDRM.multTransA(jacobian, residuals, g);
<syntaxhighlight lang="java">
+
        CommonOps_DDRM.multTransA(jacobian, jacobian, H);
VariableInteger i =  eq.lookupVariable("i");
 
for( i.value = 0; i.value < 10; i.value++ ) {
 
    // do stuff with i
 
}
 
</syntaxhighlight>
 
  
= Submatrices =
+
        CommonOps_DDRM.extractDiag(H, Hdiag);
 +
    }
  
Sub-matrices can be read from and written to.  It's easy to reference a sub-matrix inside of any matrix. A few examples are below.
+
    /**
<syntaxhighlight lang="java">
+
    * Computes the "cost" for the parameters given.
A(1:4,0:5)
+
    *
</syntaxhighlight>
+
    * cost = (1/N) Sum (f(x) - y)^2
Here rows 1 to 4 (inclusive) and columns 0 to 5 (inclusive) compose the sub-matrix of A.  The notation "a:b" indicates an integer set from 'a' to 'b', where 'a' and 'b' must be integers themselves.  To specify every row or column do ":" or all rows and columns past a certain 'a' can be referenced with "a:".  Finally, you can reference just a number by typeing it, e.g. "a".
+
    */
<syntaxhighlight lang="java">
+
    private double cost( DMatrixRMaj param ) {
A(3:,3) <-- Rows from 3 to the last row and just column 3
+
        function.compute(param, residuals);
A(:,:)   <-- Every element in A
 
A(1,2)   <-- The element in A at row=1,col=2
 
</syntaxhighlight>
 
The last example is a special case in that A(1,2) will return a double and not 1x1 matrix.  Consider the following:
 
<syntaxhighlight lang="java">
 
A(0:2,0:2) = C/B(1,2)
 
</syntaxhighlight>
 
The results of dividing the elements of matrix C by the value of B(1,2) is assigned to the submatrix in A.
 
  
A named variable can also be used to reference elements as long as it's an integer.
+
        double error = NormOps_DDRM.normF(residuals);
<syntaxhighlight lang="java">
 
a = A(i,j)
 
</syntaxhighlight>
 
  
= Inline Matrix =
+
        return error*error/(double)residuals.numRows;
 +
    }
  
Matrices can be created inline and are defined inside of brackets. The matrix is specified in a row-major format, where a space separates elements in a row and a semi-colon indicates the end of a row.
+
    /**
<syntaxhighlight lang="java">
+
    * Computes a simple numerical Jacobian.
[5 0 0;0 4.0 0.0 ; 0 0 1]
+
    *
</syntaxhighlight>
+
    * @param param (input) The set of parameters that the Jacobian is to be computed at.
Defines a 3x3 matrix with 5,4,1 for it's diagonal elements. Visually this looks like:
+
    * @param jacobian (output) Where the jacobian will be stored
<syntaxhighlight lang="java">
+
    */
[ 5 0 0 ]
+
    protected void computeNumericalJacobian( DMatrixRMaj param,
[ 0 4 0 ]
+
                                            DMatrixRMaj jacobian ) {
[ 0 0 1 ]
+
        double invDelta = 1.0/DELTA;
</syntaxhighlight>
 
An inline matrix can be used to concatenate other matrices together.
 
<syntaxhighlight lang="java">
 
[ A ; B ; C ]
 
</syntaxhighlight>
 
Will concatenate matrices A, B, and C along their rows.  They must have the same number of columns.  As you might guess, to concatenate along columns you would
 
<syntaxhighlight lang="java">
 
[ A B C ]
 
</syntaxhighlight>
 
and each matrix must have the same number of rows.  Inner matrices are also allowed
 
<syntaxhighlight lang="java">
 
[ [1 2;2 3] [4;5] ; A ]
 
</syntaxhighlight>
 
which will result in
 
<syntaxhighlight lang="java">
 
[ 1 2 4 ]
 
[ 2 3 5 ]
 
[  A  ]
 
</syntaxhighlight>
 
  
= Built in Functions and Variables =
+
        function.compute(param, temp0);
  
'''Constants'''
+
        // compute the jacobian by perturbing the parameters slightly
<pre>
+
        // then seeing how it effects the results.
pi = Math.PI
+
        for (int i = 0; i < param.getNumElements(); i++) {
= Math.E
+
            param.data[i] += DELTA;
</pre>
+
            function.compute(param, temp1);
 +
            // compute the difference between the two parameters and divide by the delta
 +
            // temp1 = (temp1 - temp0)/delta
 +
            CommonOps_DDRM.add(invDelta, temp1, -invDelta, temp0, temp1);
  
'''Functions'''
+
            // copy the results into the jacobian matrix
 +
            // J(i,:) = temp1
 +
            CommonOps_DDRM.insert(temp1, jacobian, 0, i);
  
<syntaxhighlight lang="java">
+
            param.data[i] -= DELTA;
eye(N)      Create an identity matrix which is N by N.
+
        }
eye(A)      Create an identity matrix which is A.numRows by A.numCols
+
     }
normF(A)    Frobenius normal of the matrix.
 
det(A)      Determinant of the matrix
 
inv(A)      Inverse of a matrix
 
pinv(A)      Pseudo-inverse of a matrix
 
rref(A)      Reduced row echelon form of A
 
trace(A)    Trace of the matrix
 
zeros(r,c)  Matrix full of zeros with r rows and c columns.
 
ones(r,c)    Matrix full of ones with r rows and c columns.
 
diag(A)      If a vector then returns a square matrix with diagonal elements filled with vector
 
diag(A)      If a matrix then it returns the diagonal elements as a column vector
 
dot(A,B)    Returns the dot product of two vectors as a double.  Does not work on general matrices.
 
solve(A,B)  Returns the solution X from A*X = B.
 
kron(A,B)    Kronecker product
 
abs(A)      Absolute value of A.
 
max(A)      Element with the largest value in A.
 
min(A)      Element with the smallest value in A.
 
pow(a,b)     Scalar power of a to b.  Can also be invoked with "a^b".
 
sin(a)      Math.sin(a) for scalars only
 
cos(a)      Math.cos(a) for scalars only
 
atan(a)      Math.atan(a) for scalars only
 
atan2(a,b)  Math.atan2(a,b) for scalars only
 
exp(a)      Math.exp(a) for scalars and element-wise matrices
 
log(a)      Math.log(a) for scalars and element-wise matrices
 
</syntaxhighlight>
 
  
'''Symbols'''
+
    /**
 +
    * The function that is being optimized. Returns the residual. f(x) - y
 +
    */
 +
    public interface ResidualFunction {
 +
        /**
 +
        * Computes the residual vector given the set of input parameters
 +
        * Function which goes from N input to M outputs
 +
        *
 +
        * @param param (Input) N by 1 parameter vector
 +
        * @param residual (Output) M by 1 output vector to store the residual = f(x)-y
 +
        */
 +
        void compute( DMatrixRMaj param, DMatrixRMaj residual );
  
<syntaxhighlight lang="java">
+
        /**
'*'        multiplication (Matrix-Matrix, Scalar-Matrix, Scalar-Scalar)
+
        * Number of functions in output
'+'        addition (Matrix-Matrix, Scalar-Matrix, Scalar-Scalar)
+
        *
'-'        subtraction (Matrix-Matrix, Scalar-Matrix, Scalar-Scalar)
+
        * @return function count
'/'        divide (Matrix-Scalar, Scalar-Scalar)
+
        */
'/'        matrix solve "x=b/A" is equivalent to x=solve(A,b) (Matrix-Matrix)
+
        int numFunctions();
'^'        Scalar power.  a^b is a to the power of b.
+
    }
'\'        left-divide.  Same as divide but reversed.  e.g. x=A\b is x=solve(A,b)
+
}
'.*'      element-wise multiplication (Matrix-Matrix)
 
'./'      element-wise division (Matrix-Matrix)
 
'.^'      element-wise power. (scalar-scalar) (matrix-matrix) (scalar-matrix) (matrix-scalar)
 
'^'        Scalar power.  a^b is a to the power of b.
 
'''        matrix transpose
 
'='        assignment by value (Matrix-Matrix, Scalar-Scalar)
 
 
</syntaxhighlight>
 
</syntaxhighlight>
 
= User Defined Functions =
 
 
It's easy to add your own custom functions too.  A custom function implements ManagerFunctions.Input1 or ManagerFunctions.InputN, depending on the number of inputs it takes.  It is then added to the ManagerFunctions in Equation by call add().  The output matrix should also be resized.
 
 
[[Example Customizing Equations]]
 

Latest revision as of 07:26, 7 July 2021

Levenberg-Marquardt (LM) is a popular non-linear optimization algorithm. This example demonstrate how a basic implementation of Levenberg-Marquardt can be created using EJML's procedural interface. Unnecessary allocation of new memory is avoided by reshaping matrices. When a matrix is reshaped its width and height is changed but new memory is not declared unless the new shape requires more memory than is available.

LM works by being provided a function which computes the residual error. Residual error is defined has the difference between the predicted output and the actual observed output, e.g. f(x)-y. Optimization works by finding a set of parameters which minimize the magnitude of the residuals based on the F2-norm.

Note: This is a simple straight forward implementation of Levenberg-Marquardt and is not as robust as Minpack's implementation. If you are looking for a robust non-linear least-squares minimization library in Java check out DDogleg.

External Resources:

Example Code

/**
 * <p>
 * This is a straight forward implementation of the Levenberg-Marquardt (LM) algorithm. LM is used to minimize
 * non-linear cost functions:<br>
 * <br>
 * S(P) = Sum{ i=1:m , [y<sub>i</sub> - f(x<sub>i</sub>,P)]<sup>2</sup>}<br>
 * <br>
 * where P is the set of parameters being optimized.
 * </p>
 *
 * <p>
 * In each iteration the parameters are updated using the following equations:<br>
 * <br>
 * P<sub>i+1</sub> = (H + &lambda; I)<sup>-1</sup> d <br>
 * d =  (1/N) Sum{ i=1..N , (f(x<sub>i</sub>;P<sub>i</sub>) - y<sub>i</sub>) * jacobian(:,i) } <br>
 * H =  (1/N) Sum{ i=1..N , jacobian(:,i) * jacobian(:,i)<sup>T</sup> }
 * </p>
 * <p>
 * Whenever possible the allocation of new memory is avoided.  This is accomplished by reshaping matrices.
 * A matrix that is reshaped won't grow unless the new shape requires more memory than it has available.
 * </p>
 *
 * @author Peter Abeles
 */
public class LevenbergMarquardt {
    // Convergence criteria
    private int maxIterations = 100;
    private double ftol = 1e-12;
    private double gtol = 1e-12;

    // how much the numerical jacobian calculation perturbs the parameters by.
    // In better implementation there are better ways to compute this delta.  See Numerical Recipes.
    private final static double DELTA = 1e-8;

    // Dampening. Larger values means it's more like gradient descent
    private double initialLambda;

    // the function that is optimized
    private ResidualFunction function;

    // the optimized parameters and associated costs
    private DMatrixRMaj candidateParameters = new DMatrixRMaj(1, 1);
    private double initialCost;
    private double finalCost;

    // used by matrix operations
    private DMatrixRMaj g = new DMatrixRMaj(1, 1);            // gradient
    private DMatrixRMaj H = new DMatrixRMaj(1, 1);            // Hessian approximation
    private DMatrixRMaj Hdiag = new DMatrixRMaj(1, 1);
    private DMatrixRMaj negativeStep = new DMatrixRMaj(1, 1);

    // variables used by the numerical jacobian algorithm
    private DMatrixRMaj temp0 = new DMatrixRMaj(1, 1);
    private DMatrixRMaj temp1 = new DMatrixRMaj(1, 1);
    // used when computing d and H variables
    private DMatrixRMaj residuals = new DMatrixRMaj(1, 1);

    // Where the numerical Jacobian is stored.
    private DMatrixRMaj jacobian = new DMatrixRMaj(1, 1);

    public double getInitialCost() {
        return initialCost;
    }

    public double getFinalCost() {
        return finalCost;
    }

    /**
     * @param initialLambda Initial value of dampening parameter. Try 1 to start
     */
    public LevenbergMarquardt( double initialLambda ) {
        this.initialLambda = initialLambda;
    }

    /**
     * Specifies convergence criteria
     *
     * @param maxIterations Maximum number of iterations
     * @param ftol convergence based on change in function value. try 1e-12
     * @param gtol convergence based on residual magnitude. Try 1e-12
     */
    public void setConvergence( int maxIterations, double ftol, double gtol ) {
        this.maxIterations = maxIterations;
        this.ftol = ftol;
        this.gtol = gtol;
    }

    /**
     * Finds the best fit parameters.
     *
     * @param function The function being optimized
     * @param parameters (Input/Output) initial parameter estimate and storage for optimized parameters
     * @return true if it succeeded and false if it did not.
     */
    public boolean optimize( ResidualFunction function, DMatrixRMaj parameters ) {
        configure(function, parameters.getNumElements());

        // save the cost of the initial parameters so that it knows if it improves or not
        double previousCost = initialCost = cost(parameters);

        // iterate until the difference between the costs is insignificant
        double lambda = initialLambda;

        // if it should recompute the Jacobian in this iteration or not
        boolean computeHessian = true;

        for (int iter = 0; iter < maxIterations; iter++) {
            if (computeHessian) {
                // compute some variables based on the gradient
                computeGradientAndHessian(parameters);
                computeHessian = false;

                // check for convergence using gradient test
                boolean converged = true;
                for (int i = 0; i < g.getNumElements(); i++) {
                    if (Math.abs(g.data[i]) > gtol) {
                        converged = false;
                        break;
                    }
                }
                if (converged)
                    return true;
            }

            // H = H + lambda*I
            for (int i = 0; i < H.numRows; i++) {
                H.set(i, i, Hdiag.get(i) + lambda);
            }

            // In robust implementations failure to solve is handled much better
            if (!CommonOps_DDRM.solve(H, g, negativeStep)) {
                return false;
            }

            // compute the candidate parameters
            CommonOps_DDRM.subtract(parameters, negativeStep, candidateParameters);

            double cost = cost(candidateParameters);
            if (cost <= previousCost) {
                // the candidate parameters produced better results so use it
                computeHessian = true;
                parameters.setTo(candidateParameters);

                // check for convergence
                // ftol <= (cost(k) - cost(k+1))/cost(k)
                boolean converged = ftol*previousCost >= previousCost - cost;

                previousCost = cost;
                lambda /= 10.0;

                if (converged) {
                    return true;
                }
            } else {
                lambda *= 10.0;
            }
        }
        finalCost = previousCost;
        return true;
    }

    /**
     * Performs sanity checks on the input data and reshapes internal matrices.  By reshaping
     * a matrix it will only declare new memory when needed.
     */
    protected void configure( ResidualFunction function, int numParam ) {
        this.function = function;
        int numFunctions = function.numFunctions();

        // reshaping a matrix means that new memory is only declared when needed
        candidateParameters.reshape(numParam, 1);
        g.reshape(numParam, 1);
        H.reshape(numParam, numParam);
        negativeStep.reshape(numParam, 1);

        // Normally these variables are thought of as row vectors, but it works out easier if they are column
        temp0.reshape(numFunctions, 1);
        temp1.reshape(numFunctions, 1);
        residuals.reshape(numFunctions, 1);
        jacobian.reshape(numFunctions, numParam);
    }

    /**
     * Computes the d and H parameters.
     *
     * d = J'*(f(x)-y)    <--- that's also the gradient
     * H = J'*J
     */
    private void computeGradientAndHessian( DMatrixRMaj param ) {
        // residuals = f(x) - y
        function.compute(param, residuals);

        computeNumericalJacobian(param, jacobian);

        CommonOps_DDRM.multTransA(jacobian, residuals, g);
        CommonOps_DDRM.multTransA(jacobian, jacobian, H);

        CommonOps_DDRM.extractDiag(H, Hdiag);
    }

    /**
     * Computes the "cost" for the parameters given.
     *
     * cost = (1/N) Sum (f(x) - y)^2
     */
    private double cost( DMatrixRMaj param ) {
        function.compute(param, residuals);

        double error = NormOps_DDRM.normF(residuals);

        return error*error/(double)residuals.numRows;
    }

    /**
     * Computes a simple numerical Jacobian.
     *
     * @param param (input) The set of parameters that the Jacobian is to be computed at.
     * @param jacobian (output) Where the jacobian will be stored
     */
    protected void computeNumericalJacobian( DMatrixRMaj param,
                                             DMatrixRMaj jacobian ) {
        double invDelta = 1.0/DELTA;

        function.compute(param, temp0);

        // compute the jacobian by perturbing the parameters slightly
        // then seeing how it effects the results.
        for (int i = 0; i < param.getNumElements(); i++) {
            param.data[i] += DELTA;
            function.compute(param, temp1);
            // compute the difference between the two parameters and divide by the delta
            // temp1 = (temp1 - temp0)/delta
            CommonOps_DDRM.add(invDelta, temp1, -invDelta, temp0, temp1);

            // copy the results into the jacobian matrix
            // J(i,:) = temp1
            CommonOps_DDRM.insert(temp1, jacobian, 0, i);

            param.data[i] -= DELTA;
        }
    }

    /**
     * The function that is being optimized. Returns the residual. f(x) - y
     */
    public interface ResidualFunction {
        /**
         * Computes the residual vector given the set of input parameters
         * Function which goes from N input to M outputs
         *
         * @param param (Input) N by 1 parameter vector
         * @param residual (Output) M by 1 output vector to store the residual = f(x)-y
         */
        void compute( DMatrixRMaj param, DMatrixRMaj residual );

        /**
         * Number of functions in output
         *
         * @return function count
         */
        int numFunctions();
    }
}