Example Large Dense Matrices

From Efficient Java Matrix Library
Jump to navigation Jump to search

Different approaches are required when writing high performance dense matrix operations for large matrices. For the most part, EJML will automatically switch to using these different approaches. A key parameter that needs to be tuned for specific systems is block size. It can also make sense to work directly with block matrices instead of assuming EJML does the best for your system.

External Resources:

Example Code

 * For many operations EJML provides block matrix support. These block or tiled matrices are designed to reduce
 * the number of cache misses which can kill performance when working on large matrices. A critical tuning parameter
 * is the block size and this is system specific. The example below shows you how this parameter can be optimized.
 * @author Peter Abeles
public class OptimizingLargeMatrixPerformance {
    public static void main( String[] args ) {
        // Create larger matrices to experiment with
        var rand = new Random(0xBEEF);
        DMatrixRMaj A = RandomMatrices_DDRM.rectangle(3000, 3000, -1, 1, rand);
        DMatrixRMaj B = A.copy();
        DMatrixRMaj C = A.createLike();

        // Since we are dealing with larger matrices let's use the concurrent implementation. By default
        UtilEjml.printTime("Row-Major Multiplication:", () -> CommonOps_MT_DDRM.mult(A, B, C));

        // Converts A into a block matrix and creates a new matrix while leaving A unmodified
        DMatrixRBlock Ab = MatrixOps_DDRB.convert(A);
        // Converts A into a block matrix, but modifies it's internal array inplace. The returned block matrix
        // will share the same data array as the input. Much more memory efficient, but you need to be careful.
        DMatrixRBlock Bb = MatrixOps_DDRB.convertInplace(B, null, null);
        DMatrixRBlock Cb = Ab.createLike();

        // Since we are dealing with larger matrices let's use the concurrent implementation. By default
        UtilEjml.printTime("Block Multiplication:    ", () -> MatrixOps_MT_DDRB.mult(Ab, Bb, Cb));

        // Can we make this faster? Probably by adjusting the block size. This is system dependent so let's
        // try a range of values
        int defaultBlockWidth = EjmlParameters.BLOCK_WIDTH;
        System.out.println("Default Block Size: " + defaultBlockWidth);
        for (int block : new int[]{10, 20, 30, 50, 70, 100, 140, 200, 500}) {
            EjmlParameters.BLOCK_WIDTH = block;

            // Need to create the block matrices again since we changed the block size
            DMatrixRBlock Ac = MatrixOps_DDRB.convert(A);
            DMatrixRBlock Bc = MatrixOps_DDRB.convert(B);
            DMatrixRBlock Cc = Ac.createLike();
            UtilEjml.printTime("Block " + EjmlParameters.BLOCK_WIDTH + ": ", () -> MatrixOps_MT_DDRB.mult(Ac, Bc, Cc));

        // On my system the optimal block size is around 100 and has an improvement of about 5%
        // On some architectures the improvement can be substantial in others the default value is very reasonable

        // Some decompositions will switch to a block format automatically. Matrix multiplication might in the
        // future and others too. The main reason this hasn't happened for it to be memory efficient it would
        // need to modify then undo the modification for input matrices which would be very confusion if you're
        // writing concurrent code.