The Great Refactoring

From Efficient Java Matrix Library
Jump to: navigation, search

In version 0.31 a massive refactoring is going to be performed. This was brought about by the introduction of 32-bit data types and sparse matrices. The old naming scheme was inconsistent through out the code base and didn't provide an easy way to describe all these different data types and operations which are performed on them. Fortunately SimpleMatrix and Equations API's have been left unchanged. Under the hood SimpleMatrix has changed a little bit. It can now handle 32-bit and 64-bit floats, depending on what the user passes in.

Summary of Changes:

  • Introduction of 32-bit float matrices
  • Introduction of sparse matrices
  • Names of matrices
  • Names of classes in procedural API
  • Packages and location of some classes
  • SimpleMatrix is starting to be able to support multiple matrix types internally

What has NOT changed:

  • SimpleMatrix API
  • Equations API

To make this transition easier a python script is being/has been written which will recursively perform the refactoring on older code written for version 0.30.

Feedback is Welcome! It's not too late to change what's listed here or to introduce other changes that can improve usability.

Why These Names?

There are a few reasons for the naming scheme below.

  • The new 32-bit float code is auto generated from the 64-bit double code. The translation is done through a mostly dumb search and replace.
    • This is actually a difficult requirement.
    • Need to know which files should be translated by their file name only.
    • Client libraries need to be able to auto generate their code from EJML without difficulty, e.g. can use Double as a keyword.
  • The procedural interface follows the philosophy that everything is strongly typed and provides as much control to the user as possible
    • Functions should take in a specific data type and not a general purpose one
    • SimpleMatrix is provided for those who don't to deal with this complexity
  • Attempting to keep class names for commonly used data structure short to enable concise code

Thoughts and comments about alternative approaches to meet these goals is welcomed!

Proposal 1

Matrix Name       Op-Suffice              Data Type
------------------------------------------------------------------------
DMatrixRow_F64       _R64      Dense Row-Major real double
DMatrixRow_C64       _CR64     Dense Row-Major complex double
DMatrixRow_C32       _CR32     Dense Row-Major complex float
DMatrixBlock_F64     _B64      Dense Block Real double
DMatrixFixed3x3_F64            Dense Fixed Sized 3x3 real double
SMatrixCsc_F64       _O64      Sparse Compressed Column real double
SMatrixCsc_C64       _CO64
SMatrixTriplet_F64   _T64      Sparse Triplet real double
SMatrixTriplet_C64   _CT64
  • CSC = Compressed Sparse Column (typical name) aka Compact Column

Matrix Class Names

A Matrix will follow the following pattern strictly:

<S/D>Matrix<Data Structure>_<C/F><32/64>
  • <S/D> The first character indicates if it's 'S' for sparse or 'D' for dense.
  • <Data Structure> This section specifies how the matrix is encoded internally.
  • <C/F> If it encodes a matrix using Complex or Real numbers. Just accept that 'F' is for real numbers.
  • <32/64> If the matrix uses 32-bit float or 64-bit double

Operation Class Names

<Class Name>_<Type><32/64>
  • <Class Name> This has not changed. CommonOps and NormOps are two examples
  • <Type> A single character is used to indicate the internal data structure. 'C' is the first character if complex. Otherwise assume real.
  • <32/64> If the matrix uses 32-bit float or 64-bit double

Matrix Data Structures

Character Type
R dense row-major
B dense row-major block
N/A Classes for fixed sizes matrices follow their own naming scheme
T sparse triplet
O sparse compact column

Examples:

  • CommonOps_R64 for dense row major
  • CommonOps_O64 for sparse compact column
  • CommonOps_CR64 for complex dense row major

Historical

  • DenseMatrix64F -> DMatrixRow_F64
  • CDenseMatrix64F -> DMatrixRow_C64

Proposal 2

Matrix Name       Op-Suffice              Data Type
------------------------------------------------------------------------
MatrixDense_64           _DR64      Dense Row-Major real double
MatrixDense_64C          _DR64C     Dense Row-Major complex double
MatrixDense_32C          _DR32C     Dense Row-Major complex float
MatrixBlock_64           _DB64      Dense Block Row-Major Real double
MatrixBlock_64C          _DB64C
Matrix3x3_64                        Dense Fixed Sized 3x3 real double
Matrix3_64                          Dense Fixed Sized 3 real double
MatrixSparseCsc_64       _SC64      Sparse Compressed Column real double
MatrixSparseCsc_64C      _SC64C
MatrixSparseTriplet_64   _ST64      Sparse Triplet real double
MatrixSparseTriplet_64C  _ST64C
MatrixSparseSkyline_64   _SK64      Sparse Skyline real double  (future)

Matrix Class Names

All matrices start with "Matrix" in their name followed by their structure and number format. 'Dense' is an exception since it is the most commonly used matrix type. It should be a more recognizable name.

Operation Class Names

The only difference between this and the other proposal is which suffices are used. The suffice pattern is:

_<D or S><Structure Letter><bits><C for complex>

Prefix letter is D for dense and S for sparse. Each matrix data structure will be assigned a letter for its operations. If the matrix type is complex then a C will be appended to the end.

Proposal 3

This is just a list of other ideas

MDRD  (Matrix-Dense-RowMajor-Double)
MatrixDense64
MatrixDense_64
DMatrixRow_64
DMatrixR_64
Matrix64rm
CMatrix64rm
MatrixDRM        D=double CC = row-major
MatrixZRM        Z=double complex
CMatrix64rm
MatrixSR_64
MatrixDense_64C
MatrixDoubleDense
MatrixDoubleComplexDense
MatrixDoubleDenseSingleRow
MatrixDoubleComplexDenseSingleRow

MSCD (Matrix-Sparse-CSC-Double)
MatrixCsc64
MatrixCsc_64
SMatrixCsc_64
MatrixS64cc
CMatrixS64cc
MatrixSparseDCC        D=double CC = compact column
MatrixSparseZCC        Z=double complex
MatrixSparseCsc_64
MatrixSparseCSC_64
MatrixDoubleSparseCsc
MatrixDoubleComplexSparseCsc
  • Single refers to a single array being used to store the data as compared to multiple arrays. E.g. double[], as composed to double[][]
  • LAPACK: F for single real float, D for double real, C for single complex, and Z for double complex

Packages

Use your IDE to figure these out if the automated script misses them.