Performance regression in UFLACS

Issue #134 new
Jan Blechta created an issue

The following form compiled with UFLACS shows 10 times slow down in dev compared to 2016.1.0. (Note that the form is not very useful as a bilinear form would result in dense UFC element matrix with dimensions (2**8)**2.)

from six.moves import xrange

element = VectorElement("R", triangle, 0, dim=2**8)
v = TestFunction(element)

summands = [v[i]*dx for i in xrange(len(v))]
L = sum(summands)

#L = Form([i for s in summands for i in s.integrals()])

(The last line is a workaround for circumventing a deep recursion in summation (...(s0 + s1) + s2) + s3) + s4) ...) which helps in other cases but is irrelevant here.)

Profiling the code shows 18 million calls to finiteelementbase.py:96(__eq__):

$ python2 -mffc -r uflacs -v -p bug.ufl
$ python2 -mpstats ffc_bug.profile
Welcome to the profile statistics browser.
ffc_bug.profile% sort cumtime
ffc_bug.profile% stats 30
Sun Nov 20 16:21:44 2016    ffc_bug.profile

         35394056 function calls (35334210 primitive calls) in 17.419 seconds

   Ordered by: cumulative time
   List reduced from 1613 to 30 due to restriction <30>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   17.238   17.238 /home/jan/dev/fenics-master/src/ffc/ffc/__main__.py:84(compile_ufl_data)
        1    0.001    0.001   17.238   17.238 /home/jan/dev/fenics-master/src/ffc/ffc/compiler.py:137(compile_form)
        1    0.000    0.000   17.237   17.237 /home/jan/dev/fenics-master/src/ffc/ffc/compiler.py:158(compile_ufl_objects)
        1    0.000    0.000   16.732   16.732 /home/jan/dev/fenics-master/src/ffc/ffc/representation.py:129(compute_ir)
        1    0.000    0.000   16.637   16.637 /home/jan/dev/fenics-master/src/ffc/ffc/representation.py:425(_compute_integral_ir)
        1    0.000    0.000   16.637   16.637 /home/jan/dev/fenics-master/src/ffc/ffc/uflacs/uflacsrepresentation.py:31(compute_integral_ir)
        1    0.006    0.006   16.627   16.627 /home/jan/dev/fenics-master/src/ffc/ffc/uflacs/representation/build_uflacs_ir.py:40(build_uflacs_ir)
        1    0.000    0.000   16.245   16.245 /home/jan/dev/fenics-master/src/ffc/ffc/uflacs/elementtables/terminaltables.py:359(build_optimized_tables)
        1    0.021    0.021   15.700   15.700 /home/jan/dev/fenics-master/src/ffc/ffc/uflacs/elementtables/terminaltables.py:114(build_element_tables)
      262    0.120    0.000   11.111    0.042 /home/jan/dev/fenics-master/src/ufl/ufl/algorithms/analysis.py:168(sort_elements)
      262    3.836    0.015   10.560    0.040 /home/jan/dev/fenics-master/src/ufl/ufl/utils/sorting.py:26(topological_sorting)
 18222126    4.049    0.000    5.061    0.000 /home/jan/dev/fenics-master/src/ufl/ufl/finiteelement/finiteelementbase.py:96(__eq__)
    66064    0.047    0.000    4.496    0.000 /home/jan/dev/fenics-master/src/ffc/ffc/uflacs/elementtables/terminaltables.py:148(add_table)
      263    0.005    0.000    4.207    0.016 /home/jan/dev/fenics-master/src/ffc/ffc/uflacs/elementtables/table_utils.py:109(get_ffc_table_values)
      260    0.336    0.001    4.184    0.016 /home/jan/dev/fenics-master/src/ffc/ffc/mixedelement.py:66(tabulate)
    65794    0.041    0.000    3.683    0.000 /home/jan/dev/fenics-master/src/ffc/ffc/enrichedelement.py:109(tabulate)
    65809    0.316    0.000    3.644    0.000 /home/jan/dev/fenics-master/src/fiat/FIAT/finite_element.py:175(tabulate)
    65809    0.250    0.000    3.238    0.000 /home/jan/dev/fenics-master/src/fiat/FIAT/polynomial_set.py:84(tabulate)
    65814    0.155    0.000    2.322    0.000 /home/jan/dev/fenics-master/src/fiat/FIAT/expansions.py:202(tabulate)
    65815    0.927    0.000    2.017    0.000 /home/jan/dev/fenics-master/src/fiat/FIAT/expansions.py:208(_tabulate)
   132702    0.470    0.000    1.268    0.000 {sum}
  2598597    0.423    0.000    1.204    0.000 /home/jan/dev/fenics-master/src/ufl/ufl/finiteelement/finiteelementbase.py:86(_ufl_hash_data_)
2913599/2913593    0.604    0.000    0.871    0.000 {repr}
   131385    0.771    0.000    0.771    0.000 {method 'remove' of 'list' objects}
   136854    0.758    0.000    0.758    0.000 {method 'pop' of 'list' objects}
   394890    0.587    0.000    0.588    0.000 /home/jan/dev/fenics-master/src/fiat/FIAT/expansions.py:212(<genexpr>)
        1    0.001    0.001    0.544    0.544 /home/jan/dev/fenics-master/src/ffc/ffc/uflacs/elementtables/terminaltables.py:196(optimize_element_tables)
      260    0.086    0.000    0.527    0.002 /home/jan/dev/fenics-master/src/ffc/ffc/uflacs/elementtables/table_utils.py:57(strip_table_zeros)
    65814    0.252    0.000    0.441    0.000 /usr/lib/python2.7/dist-packages/numpy/linalg/linalg.py:1924(norm)
    65833    0.121    0.000    0.415    0.000 /home/jan/dev/fenics-master/src/fiat/FIAT/polynomial_set.py:201(form_matrix_product)

Comments (7)

  1. Martin Sandve Alnæs

    Thanks for profiling, I guess I'm doing something O(n^2) for n subelements in the new element table computation.

  2. Log in to comment