Replace uBLAS backend with Eigen3

Issue #424 resolved
Prof Garth Wells created an issue

DOLFIN depends on uBLAS and Eigen3, but Eigen3 is richer and more performance. It provides sparse data structures.

Replace uBLAS classes with Eigen3 for enhanced performance and features.

Comments (22)

  1. Chris Richardson

    Some performance figures for the EigenLUSolver compared to the uBLASKrylovSolver for a 200x200 UnitSquareMesh solving Poisson

    Summary of timings               |  Average time  Total time  Reps
    ------------------------------------------------------------------
    Eigen LU solver (cholesky)       |       0.44687     0.44687     1
    Eigen LU solver (cholmod)        |       0.34772     0.34772     1
    Eigen LU solver (sparselu)       |       0.91114     0.91114     1
    Eigen LU solver (umfpack)        |       0.67976     0.67976     1
    
    uBLAS gmres                      |        2.8954      2.8954     1
    
    PETSc mumps                      |        0.9546      0.9546     1
    PETSc pastix                     |        1.3718      1.3718     1
    PETSc petsc                      |       0.60991     0.60991     1
    PETSc superlu_dist               |       0.66945     0.66945     1
    PETSc umfpack                    |       0.66618     0.66618     1
    
  2. Anders Logg (Chalmers)

    This seems to compare different methods, rather than different backends. What about Eigen gmres for example?

  3. Chris Richardson

    uBLAS does not have an LUSolver, and the built-in solver of uBLASMatrix::solve() is so slow, it does not even run in a reasonable time (minutes). I haven't implemented an EigenKrylovSolver yet.

  4. Prof Garth Wells reporter

    LU solver speed is not so interesting because Eigen has an interface to all the 'good' LU solvers. Interesting is how assembly and mat-vec speed compare to uBLAS.

  5. Chris Richardson

    They seem to be comparable. Making those LU solvers available for serial use, without requiring PETSc, is pretty useful.

    Eigen:
    Assemble system                  |       0.12315     0.12315     1
    Eigen mult                       |        0.7103      0.7103     1
    
    uBLAS:
    Assemble system                  |       0.12659     0.12659     1
    uBLAS mult                       |       0.57375     0.57375     1
    
  6. Prof Garth Wells reporter

    I'm surprised by the mat-vec. How big is the matrix? Eigen is usually reported to be many times faster than uBLAS.

  7. Chris Richardson

    That's 1000 iterations on a 40000x40000 Matrix. Maybe my implementation needs optimising. There are also various issues around RowMajor (preferred by dolfin) and ColMajor (preferred by Eigen) storage.

  8. Anders Logg (Chalmers)

    How come the timings for assembly are so close? Is it dominated by something else, or are the implementations that similar?

  9. Chris Richardson

    @logg - yes, I was thinking the same, but was typing away at EigenKrylovSolver, so I haven't had time to investigate yet...

  10. Chris Richardson

    Yes, the Assembly time is all taken up in SystemAssembler::cell_wise_assembly() not in the LA backends.

  11. Anders Logg (Chalmers)

    What equation is it? Make sure it's something simple like Poisson, and perhaps do the timings with the regular assembler only for the matrix to get the cost of insertion.

  12. Martin Sandve Alnæs

    Another option is of course to use a profiler. I bet we could find some overhead in the assembler.

  13. Chris Richardson

    I think assemble_system() is rather inefficient. Using assemble() and bc.apply() is a bit faster. Interestingly, the BC apply seems slow for uBLAS.

    uBLAS:
    Assemble cells                   |      0.028031    0.056061     2
    DirichletBC apply                |      0.014973    0.029946     2
    DirichletBC compute bc           |      0.005567    0.011134     2
    DirichletBC init facets          |     0.0051221    0.010244     2
    
    
    
    Eigen:
    Assemble cells                   |      0.023649    0.047298     2
    DirichletBC apply                |     0.0056521    0.011304     2
    DirichletBC compute bc           |     0.0055655    0.011131     2
    DirichletBC init facets          |     0.0051165    0.010233     2
    
  14. Martin Sandve Alnæs

    Nice. When that's finished we can profile again and perhaps revisit ufc signatures and the UFC class.

  15. Prof Garth Wells reporter

    I'm getting quite different assembly timing. Assembling the matrix for Poisson on a 1024x1024 mesh:

    uBLAS

    1st assembly: 1.68275s

    2nd assembly: 0.759649s

    Eigen

    1st assembly: 1.37295s

    2nd assembly: 0.445719s

    PETSc

    1st assembly: 1.34851s

    2nd assembly: 0.68317s

    Have you used the NDEBUG flag for Eigen and turned optimisations on?

  16. Log in to comment