Difference between revisions of "Example Levenberg-Marquardt"

From Efficient Java Matrix Library
Jump to navigation Jump to search
Line 1: Line 1:
Writing succinct and readable linear algebra code in Java, using any library, is problematic.  Originally EJML just offered two API's for performing linear algebraA procedural API which provided complete control over memory and which algorithms were used, but was verbose and has a sharper learning curveAlternatively you could use an object oriented API (SimpleMatrix) but lose control over memory and it has a limited set of operators.  Neither of these API's produces code which is all that similar to how equations are written mathematically.
+
Levenberg-Marquardt is a popular non-linear optimization algorithm. This example demonstrate how a basic implementation of Levenberg-Marquardt can be created using EJML's [[Procedural|procedural]] interfaceUnnecessary allocation of new memory is avoided by reshaping matricesWhen a matrix is reshaped its width and height is changed but new memory is not declared unless the new shape requires more memory than is available.
  
Languages, such as Matlab, are specifically designed for processing matrices and are much closer to mathematical notationC++ offers the ability to overload operators allowing for more natural code, see [http://eigen.tuxfamily.org Eigen]To overcome this problem EJML now provides the _Equation_ API, which allows a Matlab/Octave like notation to be used.
+
The algorithm is provided a function, set of inputs, set of outputs, and an initial estimate of the parameters (this often works with all zeros)It finds the parameters that minimize the difference between the computed output and the observed outputA numerical Jacobian is used to estimate the function's gradient.
  
This is achieved by parsing text strings with equations and converting it into a set of executable instructions, see the usage example below:
+
'''Note:''' This is a simple straight forward implementation of Levenberg-Marquardt and is not as robust as Minpack's implementation.  If you are looking for a robust non-linear least-squares minimization library in Java check out [http://ddogleg.org DDogleg].
 +
 
 +
Github Code:
 +
[https://github.com/lessthanoptimal/ejml/blob/v0.27/examples/src/org/ejml/example/LevenbergMarquardt.java LevenbergMarquardt]
 +
 
 +
== Example Code ==
  
 
<syntaxhighlight lang="java">
 
<syntaxhighlight lang="java">
eq.process("K = P*H'*inv( H*P*H' + R )");
+
/**
</syntaxhighlight>
+
* <p>
 +
* This is a straight forward implementation of the Levenberg-Marquardt (LM) algorithm. LM is used to minimize
 +
* non-linear cost functions:<br>
 +
* <br>
 +
* S(P) = Sum{ i=1:m , [y<sub>i</sub> - f(x<sub>i</sub>,P)]<sup>2</sup>}<br>
 +
* <br>
 +
* where P is the set of parameters being optimized.
 +
* </p>
 +
*
 +
* <p>
 +
* In each iteration the parameters are updated using the following equations:<br>
 +
* <br>
 +
* P<sub>i+1</sub> = (H + &lambda; I)<sup>-1</sup> d <br>
 +
* d =  (1/N) Sum{ i=1..N , (f(x<sub>i</sub>;P<sub>i</sub>) - y<sub>i</sub>) * jacobian(:,i) } <br>
 +
* H =  (1/N) Sum{ i=1..N , jacobian(:,i) * jacobian(:,i)<sup>T</sup> }
 +
* </p>
 +
* <p>
 +
* Whenever possible the allocation of new memory is avoided.  This is accomplished by reshaping matrices.
 +
* A matrix that is reshaped won't grow unless the new shape requires more memory than it has available.
 +
* </p>
 +
* @author Peter Abeles
 +
*/
 +
public class LevenbergMarquardt {
 +
    // how much the numerical jacobian calculation perturbs the parameters by.
 +
    // In better implementation there are better ways to compute this delta.  See Numerical Recipes.
 +
    private final static double DELTA = 1e-8;
  
It is easy to see that the equations code is the compact and readable.  While the syntax is heavily inspired by Matlab and its kin, it does not attempt to replicate their functionality.  It is also not a replacement for SimpleMatrix or the procedural API.  There are situations where those other interfaces are easier to use and most programs would need to use a mix.
+
    private double initialLambda;
  
Equations is designed to have minimal overhead.  It runs almost as fast as the procedural API and can be used such that all memory is predeclared.
+
    // the function that is optimized
 +
    private Function func;
  
----
+
    // the optimized parameters and associated costs
 +
    private DenseMatrix64F param;
 +
    private double initialCost;
 +
    private double finalCost;
  
__TOC__
+
    // used by matrix operations
 +
    private DenseMatrix64F d;
 +
    private DenseMatrix64F H;
 +
    private DenseMatrix64F negDelta;
 +
    private DenseMatrix64F tempParam;
 +
    private DenseMatrix64F A;
  
= Quick Start =
+
    // variables used by the numerical jacobian algorithm
 +
    private DenseMatrix64F temp0;
 +
    private DenseMatrix64F temp1;
 +
    // used when computing d and H variables
 +
    private DenseMatrix64F tempDH;
  
The syntax used in Equation is very similar to Matlab and other computer algebra systems (CAS).  It is assumed the reader is already familiar with these systems and can quickly pick up the syntax through these examples.
+
    // Where the numerical Jacobian is stored.
 +
    private DenseMatrix64F jacobian;
  
Let's start with a complete simple example then explain what's going on line by line.
+
    /**
 +
    * Creates a new instance that uses the provided cost function.
 +
    *
 +
    * @param funcCost Cost function that is being optimized.
 +
    */
 +
    public LevenbergMarquardt( Function funcCost )
 +
    {
 +
        this.initialLambda = 1;
  
<pre>
+
        // declare data to some initial small size. It will grow later on as needed.
01: public void updateP( DenseMatrix64F P , DenseMatrix64F F , DenseMatrix64F Q ) {
+
        int maxElements = 1;
02:    Equation eq = new Equation();
+
        int numParam = 1;
03:    eq.alias(P,"P",F,"F",Q,"Q");
 
04:    eq.process("S = F*P*F'");
 
05:    eq.process("P = S + Q");
 
06: }
 
</pre>
 
  
'''Line 02:''' Declare the Equation class.<br>
+
        this.temp0 = new DenseMatrix64F(maxElements,1);
'''Line 03:''' Create aliases for each variable. This allowed Equation to reference and manipulate those classes.<br>
+
        this.temp1 = new DenseMatrix64F(maxElements,1);
'''Line 04:''' Process() is called and passed in a text string with an equation in it. The variable 'S' is lazily created and set to the result of F*P*F'.<br>
+
        this.tempDH = new DenseMatrix64F(maxElements,1);
'''Line 05:''' Process() is called again and P is set to the result of adding S and Q together.  Because P is aliased to the input matrix P that matrix is changed.
+
        this.jacobian = new DenseMatrix64F(numParam,maxElements);
  
Three types of variables are supported, matrix, double, and integer.  Results can be stored in each type and all can be aliased.  The example below uses all 3 data types and to compute the likelihood of "x" from a multivariable normal distribution defined by matrices 'mu' and 'P'.
+
        this.func = funcCost;
  
<syntaxhighlight lang="java">
+
        this.param = new DenseMatrix64F(numParam,1);
eq.alias(x.numRows,"k",P,"P",x,"x",mu,"mu");
+
        this.d = new DenseMatrix64F(numParam,1);
eq.process("p = (2*pi)^(-k/2)/sqrt(det(P))*exp(-0.5*(x-mu)'*inv(P)*(x-mu))");
+
        this.H = new DenseMatrix64F(numParam,numParam);
</syntaxhighlight>
+
        this.negDelta = new DenseMatrix64F(numParam,1);
 +
        this.tempParam = new DenseMatrix64F(numParam,1);
 +
        this.A = new DenseMatrix64F(numParam,numParam);
 +
    }
  
The end result 'p' will be a double.  There was no need to alias 'pi' since it's a built in constant.  Since 'p' is lazily defined how do you access the result?
 
  
<syntaxhighlight lang="java">
+
    public double getInitialCost() {
double p = eq.lookupDouble("p");
+
        return initialCost;
</syntaxhighlight>
+
    }
  
For a matrix you could use eq.lookupMatrix() and eq.lookupInteger() for integers.  If you don't know the variable's type then eq.lookupVariable() is what you need.
+
    public double getFinalCost() {
 +
        return finalCost;
 +
    }
  
It is also possible to define a matrix inline:
+
    public DenseMatrix64F getParameters() {
 +
        return param;
 +
    }
  
<syntaxhighlight lang="java">
+
    /**
eq.process("P = [10 0 0;0 10 0;0 0 10]");
+
    * Finds the best fit parameters.
</syntaxhighlight>
+
    *
Will assign P to a 3x3 matrix with 10's all along it's diagonal. Other matrices can also be included inside:
+
    * @param initParam The initial set of parameters for the function.
<syntaxhighlight lang="java">
+
    * @param X The inputs to the function.
eq.process("P = [A ; B]");
+
    * @param Y The "observed" output of the function
</syntaxhighlight>
+
    * @return true if it succeeded and false if it did not.
will concatenate A and B horizontally.
+
    */
 +
    public boolean optimize( DenseMatrix64F initParam ,
 +
                            DenseMatrix64F X ,
 +
                            DenseMatrix64F Y )
 +
    {
 +
        configure(initParam,X,Y);
  
Submatrices are also supported for assignment and reference.
+
        // save the cost of the initial parameters so that it knows if it improves or not
<syntaxhighlight lang="java">
+
        initialCost = cost(param,X,Y);
eq.process("P(2:5,0:3) = 10*A(1:4,10:13)");
 
</syntaxhighlight>
 
P(2:5,0:3) references the sub-matrix inside of P from rows 2 to 5 (inclusive) and columns 0 to 3 (inclusive).
 
  
This concludes the quick start tutorial.  The remaining sections will go into more detail on each of the subjects touched upon above.
+
        // iterate until the difference between the costs is insignificant
 +
        // or it iterates too many times
 +
        if( !adjustParam(X, Y, initialCost) ) {
 +
            finalCost = Double.NaN;
 +
            return false;
 +
        }
  
= The Compiler =
+
        return true;
 +
    }
  
The current compiler is very basic and performs very literal translations of equations into code.  For example, "A = 2.5*B*C'" could be executed with a single call to CommonOps.multTransB().  Instead it will transpose C, save the result, then scale B by 2.5, save the result, multiply the results together, save that, and finally copy the result into A.  In the future the compiler will become smart enough to recognize such patterns.
+
    /**
 +
    * Iterate until the difference between the costs is insignificant
 +
    * or it iterates too many times
 +
    */
 +
    private boolean adjustParam(DenseMatrix64F X, DenseMatrix64F Y,
 +
                                double prevCost) {
 +
        // lambda adjusts how big of a step it takes
 +
        double lambda = initialLambda;
 +
        // the difference between the current and previous cost
 +
        double difference = 1000;
  
Compiling the text string contains requires a bit of overhead but once compiled it can be run with very fast.  When dealing with larger matrices the overhead involved is insignificant, but for smaller ones it can have a noticeable impact.  This is why the ability to precompile an equation is provided.
+
        for( int iter = 0; iter < 20 || difference < 1e-6 ; iter++ ) {
 +
            // compute some variables based on the gradient
 +
            computeDandH(param,X,Y);
  
Original:
+
            // try various step sizes and see if any of them improve the
<syntaxhighlight lang="java">
+
            // results over what has already been done
eq.process("K = P*H'*inv( H*P*H' + R )");
+
            boolean foundBetter = false;
</syntaxhighlight>
+
            for( int i = 0; i < 5; i++ ) {
 +
                computeA(A,H,lambda);
  
Precompiled:
+
                if( !solve(A,d,negDelta) ) {
<syntaxhighlight lang="java">
+
                    return false;
// precompile the equation
+
                }
Sequence s = eq.compile("K = P*H'*inv( H*P*H' + R )");
+
                // compute the candidate parameters
// execute the results with out needing to recompile
+
                subtract(param, negDelta, tempParam);
s.perform();
 
</syntaxhighlight>
 
  
Both are equivalent, but if an equation is invoked multiple times the precompiled version can have a noticable improvement in performance. Using precompiled sequences also means that means that internal arrays are only declared once and allows the user to control when memory is created/destroyed.
+
                double cost = cost(tempParam,X,Y);
 +
                if( cost < prevCost ) {
 +
                    // the candidate parameters produced better results so use it
 +
                    foundBetter = true;
 +
                    param.set(tempParam);
 +
                    difference = prevCost - cost;
 +
                    prevCost = cost;
 +
                    lambda /= 10.0;
 +
                } else {
 +
                    lambda *= 10.0;
 +
                }
 +
            }
  
To make it clear, precompiling is only recommended when dealing with smaller matrices or when tighter control over memory is required.
+
            // it reached a point where it can't improve so exit
 +
            if( !foundBetter )
 +
                break;
 +
        }
 +
        finalCost = prevCost;
 +
        return true;
 +
    }
  
When an equation is precompiled you can still change the alias for a variable.
+
    /**
<syntaxhighlight lang="java">
+
    * Performs sanity checks on the input data and reshapes internal matrices.  By reshaping
eq.alias(0,"sum",0,"i");
+
    * a matrix it will only declare new memory when needed.
Sequence s = eq.compile("sum = sum + i");
+
    */
for( int i = 0; i < 10; i++ ) {
+
    protected void configure( DenseMatrix64F initParam , DenseMatrix64F X , DenseMatrix64F Y )
    eq.alias(i,"i");
+
    {
    s.perform();
+
        if( Y.getNumRows() != X.getNumRows() ) {
}
+
            throw new IllegalArgumentException("Different vector lengths");
</syntaxhighlight>
+
        } else if( Y.getNumCols() != 1 || X.getNumCols() != 1 ) {
This will sum up the numbers from 0 to 9.
+
            throw new IllegalArgumentException("Inputs must be a column vector");
 +
        }
  
== Debugging ==
+
        int numParam = initParam.getNumElements();
 +
        int numPoints = Y.getNumRows();
  
There will be times when you pass in an equation and it throws some weird exception or just doesn't do what you expected. To see the tokens and sequence of operations set the second parameter in compile() or peform() to true.
+
        if( param.getNumElements() != initParam.getNumElements() ) {
 
+
            // reshaping a matrix means that new memory is only declared when needed
For example:
+
            this.param.reshape(numParam,1, false);
<syntaxhighlight lang="java">
+
            this.d.reshape(numParam,1, false);
eq.process("y = z - H*x",true);
+
            this.H.reshape(numParam,numParam, false);
</syntaxhighlight>
+
            this.negDelta.reshape(numParam,1, false);
When application is run it will print out
+
            this.tempParam.reshape(numParam,1, false);
<syntaxhighlight lang="java">
+
            this.A.reshape(numParam,numParam, false);
Parsed tokens:
+
        }
------------
 
VarMATRIX
 
ASSIGN
 
VarMATRIX
 
MINUS
 
VarMATRIX
 
TIMES
 
VarMATRIX
 
  
Operations:
+
        param.set(initParam);
------------
 
multiply-mm
 
subtract-mm
 
copy-mm
 
</syntaxhighlight>
 
  
= Alias =
+
        // reshaping a matrix means that new memory is only declared when needed
 +
        temp0.reshape(numPoints,1, false);
 +
        temp1.reshape(numPoints,1, false);
 +
        tempDH.reshape(numPoints,1, false);
 +
        jacobian.reshape(numParam,numPoints, false);
  
To manipulate matrices in equations they need to be aliased.  Both DenseMatrix64F and SimpleMatrix can be aliased.  A copy of scalar numbers can also be aliased.  When a variable is aliased a reference to the data is saved and a name associated with it.
 
<syntaxhighlight lang="java">
 
DenseMatrix64F x = new DenseMatrix64F(6,1);
 
eq.alias(x,"x");
 
</syntaxhighlight>
 
Multiple variables can be aliased at the same time too
 
<syntaxhighlight lang="java">
 
eq.alias(x,"x",P,"P",h,"Happy");
 
</syntaxhighlight>
 
As is shown above the string name for a variable does not have to be the same as Java name of the variable.  Here is an example where an integer and double is aliased.
 
<syntaxhighlight lang="java">
 
int a = 6;
 
eq.alias(2.3,"distance",a,"a");
 
</syntaxhighlight>
 
  
After a variable has been aliased you can alias the same name again to change it.  Here is an example of just that:
+
     }
<syntaxhighlight lang="java">
 
for( int i = 0; i < 10; i++ ) {
 
     eq.alias(i,"i");
 
    // do stuff with i
 
}
 
</syntaxhighlight>
 
  
If after benchmarking your code and discover that the alias operation is slowing it down (a hashmap lookup is done internally) then you should consider the following faster, but uglier, alternative.
+
    /**
<syntaxhighlight lang="java">
+
    * Computes the d and H parameters.  Where d is the average error gradient and
VariableInteger i =  eq.lookupVariable("i");
+
    * H is an approximation of the hessian.
for( i.value = 0; i.value < 10; i.value++ ) {
+
    */
    // do stuff with i
+
    private void computeDandH( DenseMatrix64F param , DenseMatrix64F x , DenseMatrix64F y )
}
+
    {
</syntaxhighlight>
+
        func.compute(param,x, tempDH);
 +
        subtractEquals(tempDH, y);
  
= Submatrices =
+
        computeNumericalJacobian(param,x,jacobian);
  
Sub-matrices can be read from and written to.  It's easy to reference a sub-matrix inside of any matrix. A few examples are below.
+
        int numParam = param.getNumElements();
<syntaxhighlight lang="java">
+
        int length = x.getNumElements();
A(1:4,0:5)
 
</syntaxhighlight>
 
Here rows 1 to 4 (inclusive) and columns 0 to 5 (inclusive) compose the sub-matrix of A.  The notation "a:b" indicates an integer set from 'a' to 'b', where 'a' and 'b' must be integers themselves.  To specify every row or column do ":" or all rows and columns past a certain 'a' can be referenced with "a:".  Finally, you can reference just a number by typeing it, e.g. "a".
 
<syntaxhighlight lang="java">
 
A(3:,3)  <-- Rows from 3 to the last row and just column 3
 
A(:,:)   <-- Every element in A
 
A(1,2)  <-- The element in A at row=1,col=2
 
</syntaxhighlight>
 
The last example is a special case in that A(1,2) will return a double and not 1x1 matrix. Consider the following:
 
<syntaxhighlight lang="java">
 
A(0:2,0:2) = C/B(1,2)
 
</syntaxhighlight>
 
The results of dividing the elements of matrix C by the value of B(1,2) is assigned to the submatrix in A.
 
  
A named variable can also be used to reference elements as long as it's an integer.
+
        // d = average{ (f(x_i;p) - y_i) * jacobian(:,i) }
<syntaxhighlight lang="java">
+
        for( int i = 0; i < numParam; i++ ) {
a = A(i,j)
+
            double total = 0;
</syntaxhighlight>
+
            for( int j = 0; j < length; j++ ) {
 +
                total += tempDH.get(j,0)*jacobian.get(i,j);
 +
            }
 +
            d.set(i,0,total/length);
 +
        }
  
= Inline Matrix =
+
        // compute the approximation of the hessian
 +
        multTransB(jacobian,jacobian,H);
 +
        scale(1.0/length,H);
 +
    }
  
Matrices can be created inline and are defined inside of brackets. The matrix is specified in a row-major format, where a space separates elements in a row and a semi-colon indicates the end of a row.
+
    /**
<syntaxhighlight lang="java">
+
    * A = H + lambda*I <br>
[5 0 0;0 4.0 0.0 ; 0 0 1]
+
    * <br>
</syntaxhighlight>  
+
    * where I is an identity matrix.
Defines a 3x3 matrix with 5,4,1 for it's diagonal elements.  Visually this looks like:
+
    */
<syntaxhighlight lang="java">
+
    private void computeA( DenseMatrix64F A , DenseMatrix64F H , double lambda )
[ 5 0 0 ]
+
    {
[ 0 4 0 ]
+
        final int numParam = param.getNumElements();
[ 0 0 1 ]
 
</syntaxhighlight>
 
An inline matrix can be used to concatenate other matrices together.
 
<syntaxhighlight lang="java">
 
[ A ; B ; C ]
 
</syntaxhighlight>
 
Will concatenate matrices A, B, and C along their rows.  They must have the same number of columns.  As you might guess, to concatenate along columns you would
 
<syntaxhighlight lang="java">
 
[ A B C ]
 
</syntaxhighlight>
 
and each matrix must have the same number of rows. Inner matrices are also allowed
 
<syntaxhighlight lang="java">
 
[ [1 2;2 3] [4;5] ; A ]
 
</syntaxhighlight>
 
which will result in
 
<syntaxhighlight lang="java">
 
[ 1 2 4 ]
 
[ 2 3 5 ]
 
[  A  ]
 
</syntaxhighlight>
 
  
= Built in Functions and Variables =
+
        A.set(H);
 +
        for( int i = 0; i < numParam; i++ ) {
 +
            A.set(i,i, A.get(i,i) + lambda);
 +
        }
 +
    }
  
'''Constants'''
+
    /**
<pre>
+
    * Computes the "cost" for the parameters given.
pi = Math.PI
+
    *
= Math.E
+
    * cost = (1/N) Sum (f(x;p) - y)^2
</pre>
+
    */
 +
    private double cost( DenseMatrix64F param , DenseMatrix64F X , DenseMatrix64F Y)
 +
    {
 +
        func.compute(param,X, temp0);
  
'''Functions'''
+
        double error = diffNormF(temp0,Y);
  
<pre>
+
        return error*error / (double)X.numRows;
eye(N)      Create an identity matrix which is N by N.
+
     }
eye(A)      Create an identity matrix which is A.numRows by A.numCols
 
normF(A)    Frobenius normal of the matrix.
 
det(A)      Determinant of the matrix
 
inv(A)      Inverse of a matrix
 
pinv(A)      Pseudo-inverse of a matrix
 
rref(A)      Reduced row echelon form of A
 
trace(A)    Trace of the matrix
 
zeros(r,c)  Matrix full of zeros with r rows and c columns.
 
ones(r,c)    Matrix full of ones with r rows and c columns.
 
diag(A)      If a vector then returns a square matrix with diagonal elements filled with vector
 
diag(A)      If a matrix then it returns the diagonal elements as a column vector
 
dot(A,B)    Returns the dot product of two vectors as a double.  Does not work on general matrices.
 
solve(A,B)   Returns the solution X from A*X = B.
 
kron(A,B)    Kronecker product
 
abs(A)      Absolute value of A.
 
max(A)      Element with the largest value in A.
 
min(A)      Element with the smallest value in A.
 
pow(a,b)     Scalar power of a to b.  Can also be invoked with "a^b".
 
sin(a)      Math.sin(a) for scalars only
 
cos(a)      Math.cos(a) for scalars only
 
atan(a)      Math.atan(a) for scalars only
 
atan2(a,b)  Math.atan2(a,b) for scalars only
 
exp(a)      Math.exp(a) for scalars and element-wise matrices
 
log(a)      Math.log(a) for scalars and element-wise matrices
 
</pre>
 
  
'''Symbols'''
+
    /**
 +
    * Computes a simple numerical Jacobian.
 +
    *
 +
    * @param param The set of parameters that the Jacobian is to be computed at.
 +
    * @param pt The point around which the Jacobian is to be computed.
 +
    * @param deriv Where the jacobian will be stored
 +
    */
 +
    protected void computeNumericalJacobian( DenseMatrix64F param ,
 +
                                            DenseMatrix64F pt ,
 +
                                            DenseMatrix64F deriv )
 +
    {
 +
        double invDelta = 1.0/DELTA;
  
<pre>
+
        func.compute(param,pt, temp0);
'*'        multiplication (Matrix-Matrix, Scalar-Matrix, Scalar-Scalar)
 
'+'        addition (Matrix-Matrix, Scalar-Matrix, Scalar-Scalar)
 
'-'        subtraction (Matrix-Matrix, Scalar-Matrix, Scalar-Scalar)
 
'/'        divide (Matrix-Scalar, Scalar-Scalar)
 
'/'        matrix solve "x=b/A" is equivalent to x=solve(A,b) (Matrix-Matrix)
 
'^'        Scalar power. a^b is a to the power of b.
 
'\'        left-divide.  Same as divide but reversed.  e.g. x=A\b is x=solve(A,b)
 
'.*'      element-wise multiplication (Matrix-Matrix)
 
'./'      element-wise division (Matrix-Matrix)
 
'.^'      element-wise power. (scalar-scalar) (matrix-matrix) (scalar-matrix) (matrix-scalar)
 
'^'        Scalar power.  a^b is a to the power of b.
 
'''        matrix transpose
 
'='        assignment by value (Matrix-Matrix, Scalar-Scalar)
 
</pre>
 
  
= User Defined Functions =  
+
        // compute the jacobian by perturbing the parameters slightly
 +
        // then seeing how it effects the results.
 +
        for( int i = 0; i < param.numRows; i++ ) {
 +
            param.data[i] += DELTA;
 +
            func.compute(param,pt, temp1);
 +
            // compute the difference between the two parameters and divide by the delta
 +
            add(invDelta,temp1,-invDelta,temp0,temp1);
 +
            // copy the results into the jacobian matrix
 +
            System.arraycopy(temp1.data,0,deriv.data,i*pt.numRows,pt.numRows);
  
It's easy to add your own custom functions too.  A custom function implements ManagerFunctions.Input1 or ManagerFunctions.InputN, depending on the number of inputs it takes.  It is then added to the ManagerFunctions in Equation by call add().  The output matrix should also be resized.
+
            param.data[i] -= DELTA;
 +
        }
 +
    }
  
[[Example Customizing Equations]]
+
    /**
 +
    * The function that is being optimized.
 +
    */
 +
    public interface Function {
 +
        /**
 +
        * Computes the output for each value in matrix x given the set of parameters.
 +
        *
 +
        * @param param The parameter for the function.
 +
        * @param x the input points.
 +
        * @param y the resulting output.
 +
        */
 +
        public void compute( DenseMatrix64F param , DenseMatrix64F x , DenseMatrix64F y );
 +
    }
 +
}
 +
</syntaxhighlight>

Revision as of 19:39, 31 March 2015

Levenberg-Marquardt is a popular non-linear optimization algorithm. This example demonstrate how a basic implementation of Levenberg-Marquardt can be created using EJML's procedural interface. Unnecessary allocation of new memory is avoided by reshaping matrices. When a matrix is reshaped its width and height is changed but new memory is not declared unless the new shape requires more memory than is available.

The algorithm is provided a function, set of inputs, set of outputs, and an initial estimate of the parameters (this often works with all zeros). It finds the parameters that minimize the difference between the computed output and the observed output. A numerical Jacobian is used to estimate the function's gradient.

Note: This is a simple straight forward implementation of Levenberg-Marquardt and is not as robust as Minpack's implementation. If you are looking for a robust non-linear least-squares minimization library in Java check out DDogleg.

Github Code: LevenbergMarquardt

Example Code

/**
 * <p>
 * This is a straight forward implementation of the Levenberg-Marquardt (LM) algorithm. LM is used to minimize
 * non-linear cost functions:<br>
 * <br>
 * S(P) = Sum{ i=1:m , [y<sub>i</sub> - f(x<sub>i</sub>,P)]<sup>2</sup>}<br>
 * <br>
 * where P is the set of parameters being optimized.
 * </p>
 *
 * <p>
 * In each iteration the parameters are updated using the following equations:<br>
 * <br>
 * P<sub>i+1</sub> = (H + &lambda; I)<sup>-1</sup> d <br>
 * d =  (1/N) Sum{ i=1..N , (f(x<sub>i</sub>;P<sub>i</sub>) - y<sub>i</sub>) * jacobian(:,i) } <br>
 * H =  (1/N) Sum{ i=1..N , jacobian(:,i) * jacobian(:,i)<sup>T</sup> }
 * </p>
 * <p>
 * Whenever possible the allocation of new memory is avoided.  This is accomplished by reshaping matrices.
 * A matrix that is reshaped won't grow unless the new shape requires more memory than it has available.
 * </p>
 * @author Peter Abeles
 */
public class LevenbergMarquardt {
    // how much the numerical jacobian calculation perturbs the parameters by.
    // In better implementation there are better ways to compute this delta.  See Numerical Recipes.
    private final static double DELTA = 1e-8;

    private double initialLambda;

    // the function that is optimized
    private Function func;

    // the optimized parameters and associated costs
    private DenseMatrix64F param;
    private double initialCost;
    private double finalCost;

    // used by matrix operations
    private DenseMatrix64F d;
    private DenseMatrix64F H;
    private DenseMatrix64F negDelta;
    private DenseMatrix64F tempParam;
    private DenseMatrix64F A;

    // variables used by the numerical jacobian algorithm
    private DenseMatrix64F temp0;
    private DenseMatrix64F temp1;
    // used when computing d and H variables
    private DenseMatrix64F tempDH;

    // Where the numerical Jacobian is stored.
    private DenseMatrix64F jacobian;

    /**
     * Creates a new instance that uses the provided cost function.
     *
     * @param funcCost Cost function that is being optimized.
     */
    public LevenbergMarquardt( Function funcCost )
    {
        this.initialLambda = 1;

        // declare data to some initial small size. It will grow later on as needed.
        int maxElements = 1;
        int numParam = 1;

        this.temp0 = new DenseMatrix64F(maxElements,1);
        this.temp1 = new DenseMatrix64F(maxElements,1);
        this.tempDH = new DenseMatrix64F(maxElements,1);
        this.jacobian = new DenseMatrix64F(numParam,maxElements);

        this.func = funcCost;

        this.param = new DenseMatrix64F(numParam,1);
        this.d = new DenseMatrix64F(numParam,1);
        this.H = new DenseMatrix64F(numParam,numParam);
        this.negDelta = new DenseMatrix64F(numParam,1);
        this.tempParam = new DenseMatrix64F(numParam,1);
        this.A = new DenseMatrix64F(numParam,numParam);
    }


    public double getInitialCost() {
        return initialCost;
    }

    public double getFinalCost() {
        return finalCost;
    }

    public DenseMatrix64F getParameters() {
        return param;
    }

    /**
     * Finds the best fit parameters.
     *
     * @param initParam The initial set of parameters for the function.
     * @param X The inputs to the function.
     * @param Y The "observed" output of the function
     * @return true if it succeeded and false if it did not.
     */
    public boolean optimize( DenseMatrix64F initParam ,
                             DenseMatrix64F X ,
                             DenseMatrix64F Y )
    {
        configure(initParam,X,Y);

        // save the cost of the initial parameters so that it knows if it improves or not
        initialCost = cost(param,X,Y);

        // iterate until the difference between the costs is insignificant
        // or it iterates too many times
        if( !adjustParam(X, Y, initialCost) ) {
            finalCost = Double.NaN;
            return false;
        }

        return true;
    }

    /**
     * Iterate until the difference between the costs is insignificant
     * or it iterates too many times
     */
    private boolean adjustParam(DenseMatrix64F X, DenseMatrix64F Y,
                                double prevCost) {
        // lambda adjusts how big of a step it takes
        double lambda = initialLambda;
        // the difference between the current and previous cost
        double difference = 1000;

        for( int iter = 0; iter < 20 || difference < 1e-6 ; iter++ ) {
            // compute some variables based on the gradient
            computeDandH(param,X,Y);

            // try various step sizes and see if any of them improve the
            // results over what has already been done
            boolean foundBetter = false;
            for( int i = 0; i < 5; i++ ) {
                computeA(A,H,lambda);

                if( !solve(A,d,negDelta) ) {
                    return false;
                }
                // compute the candidate parameters
                subtract(param, negDelta, tempParam);

                double cost = cost(tempParam,X,Y);
                if( cost < prevCost ) {
                    // the candidate parameters produced better results so use it
                    foundBetter = true;
                    param.set(tempParam);
                    difference = prevCost - cost;
                    prevCost = cost;
                    lambda /= 10.0;
                } else {
                    lambda *= 10.0;
                }
            }

            // it reached a point where it can't improve so exit
            if( !foundBetter )
                break;
        }
        finalCost = prevCost;
        return true;
    }

    /**
     * Performs sanity checks on the input data and reshapes internal matrices.  By reshaping
     * a matrix it will only declare new memory when needed.
     */
    protected void configure( DenseMatrix64F initParam , DenseMatrix64F X , DenseMatrix64F Y )
    {
        if( Y.getNumRows() != X.getNumRows() ) {
            throw new IllegalArgumentException("Different vector lengths");
        } else if( Y.getNumCols() != 1 || X.getNumCols() != 1 ) {
            throw new IllegalArgumentException("Inputs must be a column vector");
        }

        int numParam = initParam.getNumElements();
        int numPoints = Y.getNumRows();

        if( param.getNumElements() != initParam.getNumElements() ) {
            // reshaping a matrix means that new memory is only declared when needed
            this.param.reshape(numParam,1, false);
            this.d.reshape(numParam,1, false);
            this.H.reshape(numParam,numParam, false);
            this.negDelta.reshape(numParam,1, false);
            this.tempParam.reshape(numParam,1, false);
            this.A.reshape(numParam,numParam, false);
        }

        param.set(initParam);

        // reshaping a matrix means that new memory is only declared when needed
        temp0.reshape(numPoints,1, false);
        temp1.reshape(numPoints,1, false);
        tempDH.reshape(numPoints,1, false);
        jacobian.reshape(numParam,numPoints, false);


    }

    /**
     * Computes the d and H parameters.  Where d is the average error gradient and
     * H is an approximation of the hessian.
     */
    private void computeDandH( DenseMatrix64F param , DenseMatrix64F x , DenseMatrix64F y )
    {
        func.compute(param,x, tempDH);
        subtractEquals(tempDH, y);

        computeNumericalJacobian(param,x,jacobian);

        int numParam = param.getNumElements();
        int length = x.getNumElements();

        // d = average{ (f(x_i;p) - y_i) * jacobian(:,i) }
        for( int i = 0; i < numParam; i++ ) {
            double total = 0;
            for( int j = 0; j < length; j++ ) {
                total += tempDH.get(j,0)*jacobian.get(i,j);
            }
            d.set(i,0,total/length);
        }

        // compute the approximation of the hessian
        multTransB(jacobian,jacobian,H);
        scale(1.0/length,H);
    }

    /**
     * A = H + lambda*I <br>
     * <br>
     * where I is an identity matrix.
     */
    private void computeA( DenseMatrix64F A , DenseMatrix64F H , double lambda )
    {
        final int numParam = param.getNumElements();

        A.set(H);
        for( int i = 0; i < numParam; i++ ) {
            A.set(i,i, A.get(i,i) + lambda);
        }
    }

    /**
     * Computes the "cost" for the parameters given.
     *
     * cost = (1/N) Sum (f(x;p) - y)^2
     */
    private double cost( DenseMatrix64F param , DenseMatrix64F X , DenseMatrix64F Y)
    {
        func.compute(param,X, temp0);

        double error = diffNormF(temp0,Y);

        return error*error / (double)X.numRows;
    }

    /**
     * Computes a simple numerical Jacobian.
     *
     * @param param The set of parameters that the Jacobian is to be computed at.
     * @param pt The point around which the Jacobian is to be computed.
     * @param deriv Where the jacobian will be stored
     */
    protected void computeNumericalJacobian( DenseMatrix64F param ,
                                             DenseMatrix64F pt ,
                                             DenseMatrix64F deriv )
    {
        double invDelta = 1.0/DELTA;

        func.compute(param,pt, temp0);

        // compute the jacobian by perturbing the parameters slightly
        // then seeing how it effects the results.
        for( int i = 0; i < param.numRows; i++ ) {
            param.data[i] += DELTA;
            func.compute(param,pt, temp1);
            // compute the difference between the two parameters and divide by the delta
            add(invDelta,temp1,-invDelta,temp0,temp1);
            // copy the results into the jacobian matrix
            System.arraycopy(temp1.data,0,deriv.data,i*pt.numRows,pt.numRows);

            param.data[i] -= DELTA;
        }
    }

    /**
     * The function that is being optimized.
     */
    public interface Function {
        /**
         * Computes the output for each value in matrix x given the set of parameters.
         *
         * @param param The parameter for the function.
         * @param x the input points.
         * @param y the resulting output.
         */
        public void compute( DenseMatrix64F param , DenseMatrix64F x , DenseMatrix64F y );
    }
}