strange '--pools' behaviour

Issue #267 resolved
Selur created an issue

with HEVC encoder version 1.9+150-00ea3784bd36c164 on Win10 64bit and two Xeon E5640.

using pools 16,16:

"C:\PROGRA~1\Hybrid\x265.exe" --pools 16,16 --input - --y4m --limit-modes --no-open-gop --lookahead-slices 0 --crf 18.00 --cbqpoffs -2 --crqpoffs -2 --rdoq-level 1 --psy-rdoq 15.00 --cu-lossless --range limited --colormatrix bt709 --output "D:\00_01_28_2410_02.265"

I get:

x265 [info]: Thread pool 0 using 8 threads on numa nodes 0
x265 [info]: Thread pool 1 using 8 threads on numa nodes 1

Why 8 per node, I expected 16?

using no pools:

"C:\PROGRA~1\Hybrid\x265.exe" --pmode --pme --input - --y4m --limit-modes --no-open-gop --lookahead-slices 0 --crf 18.00 --cbqpoffs -2 --crqpoffs -2 --rdoq-level 1 --psy-rdoq 15.00 --cu-lossless --range limited --colormatrix bt709 --output "D:\23_44_40_3210_02.265"

I get:

x265 [info]: Thread pool 0 using 16 threads on numa nodes 0,1

so 16 is possible,...

using pools 16:

"C:\PROGRA~1\Hybrid\x265.exe" --pmode --pme --pools 16 --input - --y4m --limit-modes --no-open-gop --lookahead-slices 0 --crf 18.00 --cbqpoffs -2 --crqpoffs -2 --rdoq-level 1 --psy-rdoq 15.00 --cu-lossless --range limited --colormatrix bt709 --output "D:\22_43_35_2810_02.265"

I get:

Thread pool 0 using 8 threads on numa nodes 0

again: Why 8 and not 16?

Is this a bug or is there some logic to this?

Comments (6)

  1. Selur reporter

    Than I don't understand at all the whole pools-option, as I don't understand why some times 16 threads seems to be supported and sometimes not,...

  2. Ma0

    I agree that current '--pools' logic is wrong. '--pools 16' now means '--pools 16,0' which is '--pools min(16, #CPUs on NUMA node 0),0' (= in your example '--pools 8,0'). I will try to change this logic.

  3. Deepthi Nandakumar

    Right - your first 2 outputs are expected behaviour.

    The third commandline should produce the same behaviour as the second.

  4. Pradeep Ramachandran Account Deactivated

    The current pools logic is based on the principle that the the # SW threads that you create in the SW should be equal to the # HW threads that the system can run. Hence, when you try to create 16 SW threads on one node, it is clipped to 8 as any given node only has 8 HW threads. When you don't specify any --pools option, x265 creates one large pool of 16 SW threads that can be mapped to any of the 16 HW threads across both the sockets.

    Can you please help me understand the idea behind trying to create more SW threads than the # HW threads in the system? Note that each HW thread created in this fashion is a "worker thread" that can encode any row across all active frames in the system.

    Note that in reality, x265 creates a few additional light-weight threads for frame encoders and file reading, but the heavy-lifting worker threads is what is controlled by this parameter and that is what really impacts performance.

  5. Log in to comment