NUMA .... --pools ....

Issue #108 resolved
Selur created an issue

Would be nice to see some explanation: a. how does the previous '--threads X' map to the current '--pools XXX' ? b. how to best utilize X logical cores? c. how to all the 'thread'-related settings interact?

Atm. nobody seems to really know: Why the old '--threads X' option was droppend and the new '--pools XXXX' option was introduced.

Comments (4)

  1. Steve Borho

    Existing docs:

    Previously the worker threads had no processor affinity so the worker threads would be free to float across sockets (though O/S rarely do this). Now (if libx265 was compiled with NUMA support) the threads are isolated to just one socket. With NUMA awareness, work on a given picture will never cross sockets.

    Otherwise the calculus hasn't really changed. WPP, pmode, and pme all require a thread pool to work. Frame threading does not.

    Regarding the option name change; x265 --threads has never worked the same as x264 --threads and this caused a lot of confusion. Now x265 --pools doesn't work like x265 --threads or x264 --threads so I felt it was best for it to have a different name to avoid confusion going forward.

    On single socket machines, x265 --pools N will behave like x265 --threads N did, but the new name will be unambiguously distinct from x264 --threads frame/slice threading setting.

    What further questions do you have?

  2. Selur reporter

    On single socket machines, x265 --pools N will behave like x265 --threads N did ...

    Strange, a. --threads 0 and --pools 0 don't perform the same way here and b. --threads X also seems to be a bit slower than --pools X (but that might be due to some other change from the recent builds) (running on OS : Windows 7 (x64). CPU: Intel Core i7-4770K @ 3.50GHz [TB: 4.00GHz] (4C/8T), GPU: GeForce GTX 660 Ti (7 EU) @ 980 MHz (347.52), RAM: Total 32575 MB)

    Even after reading all the documentation what's missing is some sort of guideline which setting is recommend for a X-socket system where each cpu has Y-logical cpu cores. -> might be a good idea to add some sort of guideline to the '--pools' description and make the recommended option the automatic default.

  3. Steve Borho

    yes, --threads 0 meant autodetect but --pools 0 means no threads on numa node 0. --threads 1 meant disable the thread pool, --pools 1 means 1 thread on numa node 0. I think the new behavior is more discoverable.

    The default before and after the NUMA pools is for the encoder to create one thread per logical core in the system. The recommendation (before and after) is to allow the encoder to use those defaults. The only time you wouldn't want the defaults was if you have multiple sockets and wanted to isolate individual encoders to one or more sockets.

    The patches which introduced the NUMA pools also changed the lookahead threading quite a bit and that generally improved performance most of the time but could be reducing performance for some use cases. You might be encountering that, or you might be seeing problems with the new assembly code.

  4. Log in to comment