strange '--pools' behaviour
with HEVC encoder version 1.9+150-00ea3784bd36c164 on Win10 64bit and two Xeon E5640.
using pools 16,16:
"C:\PROGRA~1\Hybrid\x265.exe" --pools 16,16 --input - --y4m --limit-modes --no-open-gop --lookahead-slices 0 --crf 18.00 --cbqpoffs -2 --crqpoffs -2 --rdoq-level 1 --psy-rdoq 15.00 --cu-lossless --range limited --colormatrix bt709 --output "D:\00_01_28_2410_02.265"
I get:
x265 [info]: Thread pool 0 using 8 threads on numa nodes 0
x265 [info]: Thread pool 1 using 8 threads on numa nodes 1
Why 8 per node, I expected 16?
using no pools:
"C:\PROGRA~1\Hybrid\x265.exe" --pmode --pme --input - --y4m --limit-modes --no-open-gop --lookahead-slices 0 --crf 18.00 --cbqpoffs -2 --crqpoffs -2 --rdoq-level 1 --psy-rdoq 15.00 --cu-lossless --range limited --colormatrix bt709 --output "D:\23_44_40_3210_02.265"
I get:
x265 [info]: Thread pool 0 using 16 threads on numa nodes 0,1
so 16 is possible,...
using pools 16:
"C:\PROGRA~1\Hybrid\x265.exe" --pmode --pme --pools 16 --input - --y4m --limit-modes --no-open-gop --lookahead-slices 0 --crf 18.00 --cbqpoffs -2 --crqpoffs -2 --rdoq-level 1 --psy-rdoq 15.00 --cu-lossless --range limited --colormatrix bt709 --output "D:\22_43_35_2810_02.265"
I get:
Thread pool 0 using 8 threads on numa nodes 0
again: Why 8 and not 16?
Is this a bug or is there some logic to this?
Comments (6)
-
-
reporter Than I don't understand at all the whole pools-option, as I don't understand why some times 16 threads seems to be supported and sometimes not,...
-
I agree that current '--pools' logic is wrong. '--pools 16' now means '--pools 16,0' which is '--pools min(16, #CPUs on NUMA node 0),0' (= in your example '--pools 8,0'). I will try to change this logic.
-
Right - your first 2 outputs are expected behaviour.
The third commandline should produce the same behaviour as the second.
-
Account Deactivated The current pools logic is based on the principle that the the # SW threads that you create in the SW should be equal to the # HW threads that the system can run. Hence, when you try to create 16 SW threads on one node, it is clipped to 8 as any given node only has 8 HW threads. When you don't specify any --pools option, x265 creates one large pool of 16 SW threads that can be mapped to any of the 16 HW threads across both the sockets.
Can you please help me understand the idea behind trying to create more SW threads than the # HW threads in the system? Note that each HW thread created in this fashion is a "worker thread" that can encode any row across all active frames in the system.
Note that in reality, x265 creates a few additional light-weight threads for frame encoders and file reading, but the heavy-lifting worker threads is what is controlled by this parameter and that is what really impacts performance.
-
- changed status to resolved
in commit 3994364db8e8
- Log in to comment
You are using dual xeon E5640 (http://ark.intel.com/products/47923/Intel-Xeon-Processor-E5640-12M-Cache-2_66-GHz-5_86-GTs-Intel-QPI) each processor having 4 cores with hyper threading enabled so 8 threads on each processor. all your outputs are valid.