-
assigned issue to
Replace uBLAS backend with Eigen3
DOLFIN depends on uBLAS and Eigen3, but Eigen3 is richer and more performance. It provides sparse data structures.
Replace uBLAS classes with Eigen3 for enhanced performance and features.
Comments (22)
-
reporter -
Some performance figures for the
EigenLUSolver
compared to theuBLASKrylovSolver
for a 200x200UnitSquareMesh
solving PoissonSummary of timings | Average time Total time Reps ------------------------------------------------------------------ Eigen LU solver (cholesky) | 0.44687 0.44687 1 Eigen LU solver (cholmod) | 0.34772 0.34772 1 Eigen LU solver (sparselu) | 0.91114 0.91114 1 Eigen LU solver (umfpack) | 0.67976 0.67976 1 uBLAS gmres | 2.8954 2.8954 1 PETSc mumps | 0.9546 0.9546 1 PETSc pastix | 1.3718 1.3718 1 PETSc petsc | 0.60991 0.60991 1 PETSc superlu_dist | 0.66945 0.66945 1 PETSc umfpack | 0.66618 0.66618 1
-
This seems to compare different methods, rather than different backends. What about Eigen gmres for example?
-
uBLAS does not have an LUSolver, and the built-in solver of
uBLASMatrix::solve()
is so slow, it does not even run in a reasonable time (minutes). I haven't implemented anEigenKrylovSolver
yet. -
reporter LU solver speed is not so interesting because Eigen has an interface to all the 'good' LU solvers. Interesting is how assembly and mat-vec speed compare to uBLAS.
-
They seem to be comparable. Making those LU solvers available for serial use, without requiring PETSc, is pretty useful.
Eigen: Assemble system | 0.12315 0.12315 1 Eigen mult | 0.7103 0.7103 1 uBLAS: Assemble system | 0.12659 0.12659 1 uBLAS mult | 0.57375 0.57375 1
-
reporter I'm surprised by the mat-vec. How big is the matrix? Eigen is usually reported to be many times faster than uBLAS.
-
That's 1000 iterations on a 40000x40000 Matrix. Maybe my implementation needs optimising. There are also various issues around
RowMajor
(preferred by dolfin) andColMajor
(preferred by Eigen) storage. -
How come the timings for assembly are so close? Is it dominated by something else, or are the implementations that similar?
-
@logg - yes, I was thinking the same, but was typing away at EigenKrylovSolver, so I haven't had time to investigate yet...
-
Yes, the Assembly time is all taken up in
SystemAssembler::cell_wise_assembly()
not in the LA backends. -
What equation is it? Make sure it's something simple like Poisson, and perhaps do the timings with the regular assembler only for the matrix to get the cost of insertion.
-
Better to use the mass matrix than Poisson.
-
Yes, even better.
-
Another option is of course to use a profiler. I bet we could find some overhead in the assembler.
-
reporter @martinal We've done this before, which is why we added mesh re-ordering to improved data locality, and the ongoing work in https://bitbucket.org/fenics-project/dolfin/branch/garth/fix-issue-350-dofmap-type to get better cache performance when fetching dofmaps.
-
I think
assemble_system()
is rather inefficient. Usingassemble()
andbc.apply()
is a bit faster. Interestingly, the BC apply seems slow for uBLAS.uBLAS: Assemble cells | 0.028031 0.056061 2 DirichletBC apply | 0.014973 0.029946 2 DirichletBC compute bc | 0.005567 0.011134 2 DirichletBC init facets | 0.0051221 0.010244 2 Eigen: Assemble cells | 0.023649 0.047298 2 DirichletBC apply | 0.0056521 0.011304 2 DirichletBC compute bc | 0.0055655 0.011131 2 DirichletBC init facets | 0.0051165 0.010233 2
-
Nice. When that's finished we can profile again and perhaps revisit ufc signatures and the UFC class.
-
reporter I'm getting quite different assembly timing. Assembling the matrix for Poisson on a 1024x1024 mesh:
uBLAS
1st assembly: 1.68275s
2nd assembly: 0.759649s
Eigen
1st assembly: 1.37295s
2nd assembly: 0.445719s
PETSc
1st assembly: 1.34851s
2nd assembly: 0.68317s
Have you used the
NDEBUG
flag for Eigen and turned optimisations on? -
@garth-wells - clearly not... Looking good now.
-
reporter - changed status to resolved
Implemented in Implemented in 9bc0423.
-
- removed milestone
Removing milestone: 1.6 (automated comment)
- Log in to comment