Inefficient implementation of PETScSNESSOLVER::FormJacobian

Issue #132 resolved

Corrado Maurini created an issue 2013-10-09

Currently the function PETScSNESSolver::FormJacobian uses an inefficient MatCopy of matrices after assembling, instead of passing them as reference.

The problem is to let Mat* A to point to the mat object of (PETScMatrix dA), which is assembled by nonlinear_problem->form(dA, f, *dx). I report below the current (inefficient) FormJacobian function. Note that FormJacobian is called at each iteration of the SNES and for large problem its efficient implementation is important.

PetscErrorCode PETScSNESSolver::FormJacobian(SNES snes, Vec x, Mat* A, Mat* B,
                                                 MatStructure* flag, void* ctx)
{
  struct snes_ctx_t snes_ctx = *(struct snes_ctx_t*) ctx;
  NonlinearProblem* nonlinear_problem = snes_ctx.nonlinear_problem;
  PETScVector* dx = snes_ctx.dx;

  PETScVector f;
  PETScMatrix dA;


  // Cast from the PETSc Vec type to dolfin's PETScVector
  boost::shared_ptr<Vec> vptr(&x, NoDeleter());
  PETScVector x_wrap(vptr);

  *dx = x_wrap;

  nonlinear_problem->form(dA, f, *dx);
  nonlinear_problem->J(dA, *dx);

  MatCopy(*dA.mat(), *A, SAME_NONZERO_PATTERN);
  if (B != A)
  {
    MatCopy(*dA.mat(), *B, SAME_NONZERO_PATTERN);
  }
  *flag = SAME_NONZERO_PATTERN;  

  return 0;
}

Comments (16)

Johan Hake
- edited description
- 2013-10-09T07:53:09+00:00
Prof Garth Wells
- edited description
- 2013-10-09T07:57:35+00:00
Prof Garth Wells
- marked as bug
- marked as major
- 2013-10-09T07:58:03+00:00
Corrado Maurini reporter
A similar, but less penalizing, issue is present in PETScSNESSolver::FormFunction where a vector is copied.

Another point: if I understand correctly the logic of nonlinear_problem, nonlinear_problem-> form(dA, *dx) duplicates the matrix assembly. May you confirm?
- 2013-10-09T08:27:45+00:00
Jed Brown
The description sounds backward. You want dA to point to *A. Best to make nonlinear_problem->form and nonlinear_problem->J stateless, just using the matrix that is passed in.

Now I see the constructor
```
explicit PETScMatrix(boost::shared_ptr<Mat> A, bool use_gpu=false);
```
which makes no sense to me. Mat is already a reference counted pointer type
```
typedef struct _p_Mat*           Mat;
```
(reference counting is internal). Similarly,
```
boost::shared_ptr<Mat> _Acopy(new Mat, PETScMatrixDeleter());
```
which is making a heap allocation to hold the pointer (struct _p_Mat*).

I suggest making the PETScMatrix wrapper have a constructor that takes a normal Mat. Call PetscObjectReference in that constructor and call MatDestroy (or PetscObjectDereference) in the destructor. No boost::shared_ptr<Mat>.
- 2013-10-09T11:41:24+00:00
Corrado Maurini reporter
I agree. Indeed, I tried using the natural strategy suggested by Jed with explicit PETScMatrix(boost::shared_ptr<Mat> A, bool use_gpu=false); but i was unable to let it working. The constructor of PETScMatrix from Mat was unclear to me. But changing this may have various implications in other places and I do not feel confident in proceeding. Note that the problem goes back also to PETScBaseMatrix and there are boost::shared_ptr<Mat> or similar everywhere in dolfin. For this reason I was just looking for a workaround for PETScSNESSolver. I found a solution in the TAO interface with a constant hessian (see TAOLinearBoundSolver::__TAOFormHessianQuadraticProblem).

Garth or some other core developer, may you have a look to this?

Jed: A partially related question. Has SNES an option to do not call at each iteration the assembly routine when solving a minimization problem for a quadratic functional with bound constraints? It is useless to assemble the hessian of the functional at each iteration when it is constant.
- 2013-10-09T12:43:01+00:00
Jed Brown
Yes, getting rid of boost::shared_ptr<Mat> is a bigger task, but I it should be done and it will make these interfaces simpler.

For your second question, see:

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESSetLagJacobian.html
- 2013-10-09T12:55:14+00:00
Corrado Maurini reporter
Thanks. Regarding the second question, the documentation appears quite cryptic to me. In the case above (quadratic problem with bound constraints), should I set -snes_lag_jacobian -2 or something like -snes_lag_jacobian 1000? I will add this as an explicit parameter in the fenics interface.
- 2013-10-09T15:08:09+00:00
Simone Pezzuto
I've started looking into this issue, following Jed suggestion on getting rid of all boost::shared_ptr and on using instead internal petsc reference counter. Per se, the issue is not difficult to address, but it involves several files and lines of code, so it could potentially introduce new bugs.

At the moment I've ported PETSc{Vector,Matrix,SNESSolver} to the new "reference counting" strategy, and Form{Function,Jacobian} don't need a copy anymore, with an exception: since FormFunction takes a Vec (and not a Vec*) for the residual, when a reset_sparsity occurs a copy cannot be avoided, since a VecCreate is called. For FormJacobian, the argument is a Mat*, so it doesn't matter.

Next week I will provide a preliminary pull request on this issue.

Another question: when Form{Function,Jacobian} are called, the state solution is also provided. Generally this is not used by NonlinearProblem, since the Forms have a reference of it. In the former implementation, SNESSolver was copying the state into the solution:
```
  struct snes_ctx_t snes_ctx = *(struct snes_ctx_t*) ctx;

  // Cast from the PETSc Vec type to dolfin's PETScVector
  boost::shared_ptr<Vec> vptr(&x, NoDeleter());
  PETScVector x_wrap(vptr);

  *dx = x_wrap;  // <== operator= calls a VecCopy
```
I think this is not efficient, and also misleading. Suggestions?
- 2013-11-08T16:00:18+00:00
Prof Garth Wells
I don't see the reason to not use shared_ptr. One can supply a custom deleter, if required, and one can unwrap the shared_ptr. I would like to minimise PETSc syntax because it's pretty ugly and follows a design different from what we strive for.
- 2013-11-08T16:19:56+00:00
Simone Pezzuto
Here is my rationale: since we want to wrap some data (in this case a PETSc object, which is just a pointer), the main structure should be:
```
template <typename T>
struct PETScWrapper
{
public:
// [...]
private:
  T __obj;
  // or
  boost::scoped_ptr<T> __obj;
  // or C++11
  std::unique_ptr<T> __obj;
};
```
In my opinion, a shared_ptr doesn't make any sense here because the wrapped object is mine and I don't want to share it. If I really have to share it, I'll give you a copy of it and, for this particular case, it's not very expensive (as I said, a pointer).

When I initialise the object, <T>Create is called, and when I have to destroy it, <T>Destroy is called. The class is opaque in this respect, and the ugly PETSc syntax is hidden.

Other behaviours:
- copy constructor: if a deep copy is requested, there is no problem, since the two PETSc objects are different. When the copy is shallow, the value of __obj is the same, so PetscObjectReference must be called;
- construction from raw T: as before, deep copy doesn't harm, while then shallow copy is the same as before;
- conversion from Wrapper to T: __obj is returned.
The only criticality is the last point but, since the user is requesting the raw object, she is also leaving DOLFIN wonderland, and she has to be careful. Indeed, when the wrapper is destroyed, the previously returned object becomes invalid, unless PetscObjectReference is called. On the other hand, I don't see how a shared_ptr can heal these issues.
- 2013-11-08T18:24:05+00:00

Simone Pezzuto

Change of plans. I figured out that a switch from shared_ptr to raw data can be harmful to merge, since it would yield several changes in every PETSc...-related class (but I still consider a bad design choice the usage of shared_ptr in this context).

So I propose a shared_ptr-based solution with the attached patch. This also fix the assertion raised when a reset_sparsity is not performed.

It is still open how to deal with the preconditioner (this is not only limited to SNES): a possibility is to introduced a virtual method in the NonlinearProblem class.

diff --git a/dolfin/nls/PETScSNESSolver.cpp b/dolfin/nls/PETScSNESSolver.cpp
index d29a262..34dbb98 100644
--- a/dolfin/nls/PETScSNESSolver.cpp
+++ b/dolfin/nls/PETScSNESSolver.cpp
@@ -39,7 +39,6 @@
 #include <dolfin/common/timing.h>
 #include <dolfin/common/Timer.h>

-
 using namespace dolfin;

 // Utility function
@@ -397,18 +396,17 @@ PetscErrorCode PETScSNESSolver::FormFunction(SNES snes, Vec x, Vec f, void* ctx)
   NonlinearProblem* nonlinear_problem = snes_ctx.nonlinear_problem;
   PETScVector* dx = snes_ctx.dx;

-  PETScMatrix A;
   PETScVector df;

-  // Cast from the PETSc Vec type to dolfin's PETScVector
-  boost::shared_ptr<Vec> vptr(&x, NoDeleter());
-  PETScVector x_wrap(vptr);
+  boost::shared_ptr<Vec> x_ptr(reference_to_no_delete_pointer(x));
+  PETScVector x_wrap(x_ptr);

   *dx = x_wrap;

   // Compute F(u)
-  nonlinear_problem->form(A, df, *dx);
-  nonlinear_problem->F(df, *dx);
+  PETScMatrix A;
+  nonlinear_problem->form(A, df, x_wrap);
+  nonlinear_problem->F(df, x_wrap);

   VecCopy(*df.vec(), f);

@@ -422,24 +420,60 @@ PetscErrorCode PETScSNESSolver::FormJacobian(SNES snes, Vec x, Mat* A, Mat* B,
   NonlinearProblem* nonlinear_problem = snes_ctx.nonlinear_problem;
   PETScVector* dx = snes_ctx.dx;

-  PETScVector f;
-  PETScMatrix dA;
+  boost::shared_ptr<Mat> A_ptr(reference_to_no_delete_pointer(*A));
+  PETScMatrix A_wrap(A_ptr);
+  A_ptr.reset();

-  // Cast from the PETSc Vec type to dolfin's PETScVector
-  boost::shared_ptr<Vec> vptr(&x, NoDeleter());
-  PETScVector x_wrap(vptr);
+  boost::shared_ptr<Vec> x_ptr(reference_to_no_delete_pointer(x));
+  PETScVector x_wrap(x_ptr);

   *dx = x_wrap;

-  nonlinear_problem->form(dA, f, *dx);
-  nonlinear_problem->J(dA, *dx);
+  PETScVector f;
+  nonlinear_problem->form(A_wrap, f, x_wrap);
+  nonlinear_problem->J(A_wrap, x_wrap);
+
+  if (*A != *A_wrap.mat())
+  {
+    // Matrix has been resetted during assembly
+    MatDestroy(A);
+    PetscObjectReference(PetscObject(*A_wrap.mat()));
+    *A = *A_wrap.mat();
+    *flag = DIFFERENT_NONZERO_PATTERN;
+  }
+  else
+  {
+    *flag = SAME_NONZERO_PATTERN;
+  }
+
+  if (*A != *B)
+  {
+    boost::shared_ptr<Mat> B_ptr(reference_to_no_delete_pointer(*B));
+    PETScMatrix B_wrap(B_ptr);
+    B_ptr.reset();
+
+    // Uncomment these two lines when a proper
+    // preconditioner constructor is provided,
+    // and comment out the third one.
+
+    // nonlinear_problem->Jprec(B_wrap, x_wrap);
+    // Mat B_out(*B_wrap.mat())

-  MatCopy(*dA.mat(), *A, SAME_NONZERO_PATTERN);
-  if (B != A)
+    Mat B_out(*A_wrap.mat());
+
+    if (*B != B_out)
+    {
+      MatDestroy(B);
+      PetscObjectReference(PetscObject(B_out));
+      *B = B_out;
+    }
+  }
+  else
   {
-    MatCopy(*dA.mat(), *B, SAME_NONZERO_PATTERN);
+    MatDestroy(B);
+    PetscObjectReference(PetscObject(*A_wrap.mat()));
+    *B = *A_wrap.mat();
   }
-  *flag = SAME_NONZERO_PATTERN;

   return 0;
 }

2013-11-08T22:41:12+00:00

Prof Garth Wells
- changed milestone to 1.4
- 2013-11-11T20:34:36+00:00
Prof Garth Wells
- changed version to dev
- 2014-01-18T13:51:54+00:00
Prof Garth Wells
- changed status to resolved
Fix Issue ~~#132~~.

This commit removes the copying, but shows up the design flaw that is GenericTensor::resize. GenericTensor::resize should be remove.

→ <<cset c24aa5ceaafc>>
- 2014-01-18T20:41:09+00:00
Prof Garth Wells
- removed milestone
Removing milestone: 1.4 (automated comment)
- 2015-02-21T09:54:12+00:00
Log in to comment

Assignee: –

Type: bug

Priority: major

Status: resolved

Component: –

Milestone: –

Version: dev

Votes: 3

Watchers: 5