Non-FACS Expressions/Units could be optimized to improve performance

Issue #461 resolved
Xin created an issue

FACS are already faster than the previous 1.5.1 python implementation which didn’t even support FACS entirely, but non-FACS expressions/units could still be improved. Even if they are legacy expressions now that the FACS system is in place, it might be worth optimizing them a little.

So I tried grouping the single term drivers of each expression slider, with “scripted” drivers which are then summed in the final “sum” driver. In my computer, which has 4 real cores, I got a boost of around 8 FPS (to almost 40 FPS) over 1.5.1 when grouping around 10 of these terms together. If you don’t group them, which is what the current implementation does, the performance is worse, which might be related to my processor not having enough cores or some Blender implementation problem (see: https://developer.blender.org/T86658 ).

Attached is the modified “load_morph.py” that implements such grouping. Search “MAX_TERMS_” and change its value to try alternatives. I believe the optimal value might depend on the number of available cores and the Blender version (since I tested on the potentially buggy Blender 2.92).

To test, use the “Import Expressions” or “Import Units” (not FACS, but the older expressions/units) and run an animation.

A good idea would be to expose MAX_TERMS_ as a global option, at least until Blender fixes the problems with multithreading.

Comments (7)

  1. Alessandro Padovani

    Just to be clear to anyone reading that’s not so tech, including me. Is this a workaround for the bug in 2.92 as reported in #452, or is this a general performance gain that will also work in 2.93 with the bug fixed ?

  2. Thomas Larsson repo owner

    Implemented in last commit. The number of variables in each batch is currently limited to four for debugging reasons, but will be increased when we feel confident that the code works.

  3. Xin reporter

    I can get almost the same FPS gain as in my test with the current commit (3dd9133). If people find FPS issues with other cpus, it might be a good idea to expose the number of variables that go in each batch for testing purposes.

    Alessandro, if Blender was smart about it (maybe related to current bugs), then no, this shouldn’t give any gain. Blender could and should implement the “batching” internally to optimize cpu usage when executing the drivers in multiple threads (Blender has access to the dependency graph so it knows that it can evaluate these drivers in parallel without issues). A ‘SUM' driver type with a lot of inputs could also be further multithreaded by Blender and later combined.

    But maybe Blender’s implementation is not as polished currently (outside the current bugs), and this would still give a performance gain in later versions. We will see.

  4. Xin reporter

    Oh, and as a small note Thomas, you can make python always print the sign (whether negative or positive) by doing this:

    return "%+g*%s" % (factor, term)
    

    Notice the + after %. With this you can get rid of a few “if else”.

  5. Log in to comment