SNES performance inferior to Newton solver for a sequence of nonlinear problems

Issue #786 resolved

rambausek created an issue 2016-11-16

For problems of sufficient size (# of DoFs) PETScSNESSolver takes far longer for solving a sequence of nonlinear problems than the Newton solver provided by dolfin. This applies also for choosing these implementations for NonlinearVariationalSolver.

In detail, I compared PETScSNESSolver and NewtonSolver on problems that do not required any line-search or other stabilization techniques. In the problems investigated, the steps computes by SNES are identical to the Newton steps. One prototype example is the Cahn-Hilliard python demo. I also have applied both solver to some of my research problems, where the difference is even more pronounced. The modified Cahn-Hilliard demo and some primitive performance measurements (time-time()...) are attached. The significant performance difference is highlighted with "###". There seems to be an operation, which NewtonSolver performs only for the first nonlinear problem, while SNES performs this operation for each new nonlinear problem.

Comments (15)

Prof Garth Wells
- changed milestone to 2017.1
SNES interface needs a thorough review.
- 2016-11-27T09:48:18+00:00
Prof Garth Wells
Could you add a brief summary of the differences to your report? E.g., confirm that the convergence behaviour the same for the two Newton solvers? (this isn't entirely clear to me from your report.)
- 2017-02-08T09:33:52+00:00
rambausek reporter
- attached perf_cahn_hilliard_newton.txt
- attached perf_cahn_hilliard_snes.txt
You are right, might report is not as clear as it could be. I hope I can clarify:

The observed difference shown by my "time measurements" is that with the plain Newton solver, there is one time span after assembly of the tangent "J" and before the assembly of the residual "F" that is ~7.5 seconds. All following corresponding intervals are ~4.3 seconds. That 7.5-seconds-span occurs after the first time the tangent matrix is assembled. When I run the Newton solver again, for another time step, this 7.5-seconds do not appear again. On the other hand, for the SNES solver, I observe such a 7.5-seconds interval in the first iteration of each SNES (for all time steps).

The attached files are some more concise versions of the to perf_XXX.txt files annotated with hopefully helpful information.
- 2017-02-08T10:34:51+00:00
rambausek reporter
Concerning the convergence behaviour: Both solvers report the same residuals, so I suppose they are doing the same math. They take the same number of iterations to converge. To me it looks like as if SNES releases some memory after having solved the nonlinear problem, while the Newton solver does not and thus can reuse some storage for the next run (time step...). Note that both solvers are not destroyed between time steps.
- 2017-02-08T10:45:18+00:00
Jan Blechta
Check this copying https://bitbucket.org/fenics-project/dolfin/src/664a4501bda637ea49519c54a4e0287eeda836b9/dolfin/nls/PETScSNESSolver.cpp?at=master&fileviewer=file-view-default#PETScSNESSolver.cpp-307. It might be possible to circumvent it for plain Newton. There might be more.
- 2017-02-08T12:08:34+00:00
rambausek reporter
Maybe I am wrong with the memory stuff, at least I would not expect that copying a vector takes 3 seconds at that problem size. I am not so familiar with debugging c++ stuff called from python, so I did not yet figure out, at which point the time is lost...
- 2017-02-08T15:11:47+00:00
Lawrence Mitchell
It looks like in the PETSc case, the symbolic factorisation of the operator is happening for every SNES solve (whereas it ought to be reusable). Possibly because the assembly is setting a flag that changes the nonzero pattern?
- 2017-02-08T15:34:57+00:00
rambausek reporter
This means, the problem should not occur with iterative solvers (MUMPS was used in the comparison). I also attached plots of my memory consumption, where one can observe some valleys in the SNES case, which also probably support Lawrence's point.
- 2017-02-08T16:20:30+00:00
rambausek reporter
- attached mem_usage_snes.png
- attached mem_usage_newton.png
memory consumption of SNES and Newton. Obviously, there are valleys or even planes in the graph for SNES, which rather indicates some additional computation than allocation of memory.
- 2017-02-08T16:22:20+00:00
Lawrence Mitchell
The problem (rebuilding part of the preconditioner) will likely also occur with iterative solvers. It's just potentially the setup is not quite as expensive. Although this will break things like reusing the interpolation operators in an AMG preconditioner.
- 2017-02-08T17:42:39+00:00
Jan Blechta
- assigned issue to
  
  Jan Blechta
- edited description
Might have fix for this.
- 2017-02-21T20:10:56+00:00
Jan Blechta
@rambausek, can you test that branch jan/fix-issue-714 fixes your problem? You can pull docker image quay.io/fenicsproject_dev/dolfin:jan-fix-issue-714 to test.
- 2017-02-22T00:56:10+00:00
rambausek reporter
@blechta thanks! I tested the docker image and can confirm that the snes-issue does not occur therein.
- 2017-02-24T12:42:18+00:00
Jan Blechta
Should be fixed by pull request #330, commit fc5621655379d08d9fb4ee3e48915e7e8077154d.
- 2017-02-28T11:52:26+00:00
Jan Blechta
- changed status to resolved
- 2017-02-28T11:53:11+00:00
Log in to comment

Assignee: Jan Blechta

Type: bug

Priority: minor

Status: resolved

Component: –

Milestone: 2017.1

Version: 2016.1

Votes: 0

Watchers: 1