ufc::finite_element::evaluate_basis_derivatives is terribly slow

Issue #11 resolved
Prof Garth Wells created an issue

The FFC generated function body for ufc::finite_element::evaluate_basis_derivatives is terribly slow. An obvious problem is that it uses dynamic memory allocation (even inside loops!). UFC functions that are possibly called repeatedly should not use dynamic memory allocation. If scratch space is required, this should be passed in as a function argument.

The reason for the dynamic allocation is that the derivative degree is set by the user. A possibility would be to make the derivative order a template argument.

Comments (31)

  1. Martin Sandve Alnæs

    No need to make it that complicated. Just allocate sufficient space on the stack for the highest order derivatives that code is generated for. I.e. for each array that is allocated with new today, ffc computes the largest size it may need and uses that for an automatic allocation.

  2. Prof Garth Wells reporter

    A user might want to compute a derivative of an order that is higher than that in the weak form.

  3. Martin Sandve Alnæs

    And in any case, all elements have a finite polynomial degree p which gives a finite maximum nonzero derivative order p. If higher order derivatives (>p) are asked for we can use memset to fill with an appropriate amount of zeros, no need for any allocations then.

  4. Prof Garth Wells reporter

    OK, should be pretty easy then to get some speed-up by dealing with the memory allocations.

  5. Kristian Ølgaard

    I got rid of the dynamic arrays and got a 16% speed-up for vector P2 Lagrange in 3D.

    • no g++ optimisation N=1e5 old: 54s new: 54s

    • g++ -O2 N=1e5 old: 11.2s new: 10.2s

    • g++ -O2 -funroll-loops N=1e5 old: 9.2s new: 8.1s

    • g++ -O3 N=1e5 old: 7.5s new: 6.3s

    I also modified the code to only allow n <= max_degree order derivatives, with max_degree being the highest degree of all subelements (in case of mixed elements). I think the code can be made a lot faster if we generate code for each specific value of n (perhaps only generate n=1 by default).

  6. Johan Hake

    This generates code that wont compile. See the next buildbot. In particular:

    // Declare two dimensional array that holds combinations of derivatives and initialise
    unsigned int combinations[1][0];
    
  7. Kristian Ølgaard

    This is related to the max([e.degree() for e in elements]), so for constant elements max degree is 0.

    For constant elements, would it be an option to not generate code for evaluating derivatives and simply return a warning:

    std::cerr << "*** FFC warning: " << "evaluate_basis_derivatives: Function not supported/implemented for constant elements." << std::endl;

    or is it better to have 1 as the minimum order of derivatives?

  8. Martin Sandve Alnæs

    Returning zero is good. Its nice to be able to write generic code where values become e.g. zero instead of crashing.

  9. Kristian Ølgaard

    I won't crash the program, the function just returns without computing anything. The size of the array 'values' is unknown, so how should a zero be returned.

  10. Martin Sandve Alnæs

    I don't understand, the size of the array 'values' does not depend on the degree of the basis functions.

  11. Martin Sandve Alnæs

    The actual output size is computed at run-time from the degree argument and the known value size of the element. Just memzero that number of doubles, same behaviour as for degree larger than the known max degree. Except that for degree 0 elements we can just skip generating code for computing, just always use the memzero version because the degree is always larger than the max degree.

  12. Kristian Ølgaard

    No, but it depends on the order of derivatives (and the geometric dimension) that the user is asking for. For Lagrange in 2D and with n=1, the function will return: values[0] = u,x values[1] = u,x

    for n=2 values[0] = u,xx values[1] = u,xy values[2] = u,yx values[3] = u,yy

    so the user must be responsible for making the call with an array 'values' which is sufficiently big. So what does 'returning zero' mean if one does not know the size of the incoming array?

  13. Martin Sandve Alnæs

    But the size of the incoming array does not depend on the degree of the element. It has the same size for DG0, DG1, CG1, CG7.

    For both Lagrange and DG0 in 2D and with n=1, the function will return:

    values[0] = u,x values[1] = u,x
    

    for n=2

    values[0] = u,xx values[1] = u,xy values[2] = u,yx values[3] = u,yy
    

    The only difference is that for DG0 (and R) we know at compile time that u,x = 0 etc.

  14. Kristian Ølgaard

    I agree, but I never claimed that the values array depends on the degree of the element, it only depends on n as you write above.

    So in conclusion, we agree on setting 'values' equal to zero where return values should have been had the computation been carried out.

    What if a user ask for n=0, should a call be directed to evaluate_basis then? (The regression/test code in FFC actually calls evaluate_basis_derivatives with n=1)

  15. Martin Sandve Alnæs

    Good!

    Redirecting to evaluate_basis will make the interface more general, maybe we can avoid some code duplication in dolfin if we do that.

  16. Prof Garth Wells reporter
    • changed status to open

    I'm re-opening this because I've reverted the merge into next because it breaks the buildbot on DOLFIN next.

  17. Kristian Ølgaard

    BTW, code containing

    unsigned int combinations[1][0];
    

    compiles fine when I run the FFC regression tests with compiler options:

    g++ -I/home/oelgaard/software/fenics/include -I/usr/include -L/usr/lib -Wall -Werror
    

    g++ version 4.7.3

    is the buildbot using a different set of options?

  18. Johan Hake

    The build broke during compilation of the demos. I believe these use the CMAKE_CXX_FLAGS used to compile DOLFIN and for DEVELOPER -pedantic is used, and that will raise a warning for null sized arrays.

  19. Kristian Ølgaard

    Something went wrong when merging the fix into next, so I deleted the branch and starting anew.

    Kristian

  20. Kristian Ølgaard

    Fix issue 11: ufc::finite_element::evaluate_basis_derivatives is terribly slow.

    Avoid using dynamic arrays. If n == 0, call evaluate_basis instead. If n > max_degree, return zeros (with max_degree being the maximum polynomial degree of the finite element(s)).

    → <<cset 8ddcf54df96c>>

  21. Log in to comment