Still have issue in commit: 365f7ed
My CPU are E5 2595 V2 x2 in 24threads per CPU
When I using like:
--preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pools -,+ --pmode --output-depth 10
x265 Only use 6 threads of the NUMA node 1.
But when I back to the commit:1e61810 it worked fine,all 24 threads in NUMA node 1 worked fine
Comments (30)
-
Account Deactivated -
reporter In NUMA node 0 24 threads worked fine.
I tired --pools +,- or --pools 24,- This Worked fine.
But sth like --pools -,+ or --pools -,24 Only the first of the six threads in NUMA node 1 worked.
-
Account Deactivated Hmm.. this is strange. I've tried this on a 64-bit linux system and it worked just fine. I would like to replicate your problem to fix it - can you please specify if you're running windows/linux with 32- or 64-bit build? And what is your command line/
Pradeep
-
reporter windows server 2012 R2. 64bit system.
"D:\MeGUI_x86\tools\x265\avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pools -,+ --output-depth 10 --output "test.hevc" "test.avs"
I tested with --output-depth 10 --pmode --output-depth 10 --output-depth 12 --pmode --output-depth 12
All happened in the same way.
yuv [info]: 1920x1080 fps 24000/1001 i420p10 unknown frame count raw [info]: output file: F:\test.hevc x265 [info]: HEVC encoder version 1.7+478-365f7ed4d896 x265 [info]: build info [Windows][MSVC 1900][64 bit] 10bit x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX x265 [info]: Main 10 profile, Level-5 (Main tier) x265 [info]: Thread pool 0 using 24 threads with NUMA node mask 2 x265 [info]: frame threads / pool features : 6 / wpp(34 rows) x265 [info]: Coding QT: max CU size, min CU size : 32 / 8 x265 [info]: Residual QT: max TU size, max depth : 16 / 3 inter / 3 intra x265 [info]: ME / range / subpel / merge : star / 44 / 5 / 4 x265 [info]: Keyframe min / max / scenecut : 1 / 720 / 40 x265 [info]: Intra 32x32 TU penalty type : 2 x265 [info]: Lookahead / bframes / badapt : 72 / 9 / 2 x265 [info]: b-pyramid / weightp / weightb : 1 / 1 / 1 x265 [info]: References / ref-limit cu / depth : 5 / 0 / 0 x265 [info]: AQ: mode / str / qg-size / cu-tree : 1 / 1.1 / 32 / 1 x265 [info]: Rate Control / qCompress : CRF-17.0 / 0.78 x265 [info]: tools: rd=6 psy-rd=1.20 rdoq=2 psy-rdoq=14.00 signhide tmvp x265 [info]: tools: b-intra deblock(tC=-2:B=-2)
compile with GCC 5.2.0+libmsvc120.a still have this problem.
yuv [info]: 1920x1080 fps 24000/1001 i420p12 unknown frame count raw [info]: output file: test.hevc x265 [info]: HEVC encoder version 1.7+478-365f7ed4d896 x265 [info]: build info [Windows][GCC 5.2.0][64 bit] 12bit x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX x265 [info]: Main 12 profile, Level-5 (Main tier) x265 [info]: Thread pool 0 using 24 threads with NUMA node mask 2 x265 [info]: frame threads / pool features : 6 / wpp(34 rows) x265 [info]: Coding QT: max CU size, min CU size : 32 / 8 x265 [info]: Residual QT: max TU size, max depth : 16 / 3 inter / 3 intra x265 [info]: ME / range / subpel / merge : star / 44 / 5 / 4 x265 [info]: Keyframe min / max / scenecut : 1 / 720 / 40 x265 [info]: Intra 32x32 TU penalty type : 2 x265 [info]: Lookahead / bframes / badapt : 72 / 9 / 2 x265 [info]: b-pyramid / weightp / weightb : 1 / 1 / 1 x265 [info]: References / ref-limit cu / depth : 5 / 0 / 0 x265 [info]: AQ: mode / str / qg-size / cu-tree : 1 / 1.1 / 32 / 1 x265 [info]: Rate Control / qCompress : CRF-17.0 / 0.78 x265 [info]: tools: rd=6 psy-rd=1.20 rdoq=2 psy-rdoq=14.00 signhide tmvp x265 [info]: tools: b-intra deblock(tC=-2:B=-2)
-
Was this fixed by 365f7ed4d896?
-
reporter Title:Still have issue in commit: 365f7ed
without commit 365f7ed4d896.
It would use only one threads when encoding.
with commit 365f7ed4d896.
It only use the full threads in numa node 0.
And only threads 1-6 in numa node 1.
threads 17-24 in numa node 1 still keep unused in x265.
-
Account Deactivated Changhao, I know the reason behind the problem.
From the screen shot of your log above, I can see with the option of --pools -,+, we create one threadpool with 24 threads all of which go to NUMA node 1 (in the log, it says x265 [info]: Thread pool 0 using 24 threads with NUMA node mask 2 which is NUMA node 1 only). So when using only one pool things work ok.
Now, when you don't specify any --pools option, you want to create 48 threads in all. Assuming that you have a 32-bit compile, 48%32 = 8 which is < 32/2. Based on the heuristic in the commit message of 365f7ed4d896 https://bitbucket.org/multicoreware/x265/commits/365f7ed4d896, we cap such that the total # threads will be 32. I don't think this heuristic is required anymore as we don't split into multiple pools unless specified. I will send a patch for this in a bit and you can test it out to see if it solves your problem.
-
Account Deactivated Can you see if the attached patch helps? It clips the # threads only if the last pool has <1/4th max threads which shouldn't be the case for you.
-
reporter you mean this patch?
-
Account Deactivated Yes
-
reporter The problem still remained.
In my tests
1.7+479(leatest commit 365f7ed4d896 plus this pactch:https://patches.videolan.org/patch/10023/)
no --pools
"avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pmode --output-depth 10 --output "test.hevc" "test.avs"
https://bitbucket.org/repo/49Lnnp/images/2448750373-1.7+479%20no%20--pools.png
With --pools +,-
"avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pmode --pools +,- --output-depth 10 --output "test.hevc" "test.avs"
https://bitbucket.org/repo/49Lnnp/images/872375026-1.7+479%20--pools+,-.png
With --pools -,+
"avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pmode --pools -,+ --output-depth 10 --output "test.hevc" "test.avs"
https://bitbucket.org/repo/49Lnnp/images/4150634568-1.7+479%20--pools%20-,+.png
roll back to 1.7+473(commit 1e6181090f1df6790d5914c18be4272aa1b54f69)
no --pools
"avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pmode --output-depth 10 --output "test.hevc" "test.avs"
https://bitbucket.org/repo/49Lnnp/images/2785897438-1.7+473%20no%20--pools.png
With --pools +,-
"avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pmode --pools +,- --output-depth 10 --output "test.hevc" "test.avs"
https://bitbucket.org/repo/49Lnnp/images/2136934142-1.7+473%20--pools+,-.png
With --pools -,+
"avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pmode --pools -,+ --output-depth 10 --output "test.hevc" "test.avs"
https://bitbucket.org/repo/49Lnnp/images/3457745820-1.7+473%20--pools-,+.png
-
Account Deactivated I'm not able to see the screenshots - can you just paste them in the message directly like you'd done earlier?
-
reporter OK Changed photo link to bitbucket.org.
This should worked.
PS:I need to show the picture of the CPU threads.
-
Account Deactivated Thanks - I'm able to see the snapshots now.
With the command --pools -,+, your snapshot shows that 24 threads were indeed launched on NUMA node 1 with the new commit. You'd previously reported that this wasn't the case - can you please confirm?
Now, without the --pools command, we see that there are indeed 48 threads launched across both NUMA nodes (from the console log), but in your snapshot, only 32 threads are active. I wonder if this is just to do with the overall change in the execution profile. Do you see a performance improvement with 1.7+149 commit (with new patch) over 1.7+473 commit? If not, can you try adding a --pools 24,24 to your command line explicitly and seeing if that helps performance?
-
reporter I am so sorry that I upload the wrong snapshots.
1.7+479 --pools -,+ and no --pools photo should be exchanged by each other.
I have already fixed it
1.7+479 With --pools 24,24
"avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pmode --pools 24,24 --output-depth 10 --output "test.hevc" "test.avs"
https://bitbucket.org/repo/49Lnnp/images/2106491959-1.7+479%20--pools%2024,24.png
1.7+473 With --pools 24,24
"avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pmode --pools 24,24 --output-depth 10 --output "test.hevc" "test.avs"
https://bitbucket.org/repo/49Lnnp/images/523583665-1.7+473%20--pools%2024,24.png
Still When I set sth like.
without the --pools command
--pools -,+
--pools +,+
--pools -,24
--pools 24,24from the console log x265 says that all 48/24 threads are launched.
But x265 only 32/8 threads are active.12 of the threads in CPU1 I couldn't use it in 1.7+479.
-
Account Deactivated Don't worry about the snapshots - figured that you'd interchanged them :-).
Maybe the problem is that DWORD defaults to unsigned long which is 32-bits. Can you try with this patch? https://patches.videolan.org/patch/10054/
-
reporter 1.7+480(add both https://patches.videolan.org/patch/10054/ and https://patches.videolan.org/patch/10023/)
Worked fine in
--pools -,+ --pools +,- --pools 24,24
But strange happened in
--pools +,+ without the --pools command
When use the command "--pools +,+" it would say "Thread pool 0 using 48 threads with NUMA node mask 2" and only use 24 threads of CPU1.No CPU0 threads are used by x265.
"avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pmode --pools +,+ --output-depth 10 --output "test.hevc" "test.avs"
https://bitbucket.org/repo/49Lnnp/images/3753691839-1.7+480%20--pools%20+,+.png
When not use the command "--pools" it would say "Thread pool 0 using 48 threads with NUMA node mask 3" and x265 use 24 threads of CPU1 but only part of the threads in CPU0. Also the speed is slower than the "--pools 24,24 in 1.7+480" or the same “no --pools in 1.7+473”.
"--pools 24,24" 2.3-2.6 FPS "no --pools" 2.0-2.2 FPS
"avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pmode --output-depth 10 --output "test.hevc" "test.avs"
https://bitbucket.org/repo/49Lnnp/images/2711194435-1.7+480%20no%20--pools.png
Some snapshots of the other options that worked fine
1.7+473 With --pools 24,24
"avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pmode --pools 24,24 --output-depth 10 --output "test.hevc" "test.avs"
https://bitbucket.org/repo/49Lnnp/images/523583665-1.7+473%20--pools%2024,24.png
1.7+480 With --pools 24,24
"avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pmode --pools 24,24 --output-depth 10 --output "test.hevc" "test.avs"
https://bitbucket.org/repo/49Lnnp/images/3845561199-1.7+480%20--pools%2024,24.png
1.7+473 With --pools +,+
"avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pmode --pools +,+ --output-depth 10 --output "test.hevc" "test.avs"
https://bitbucket.org/repo/49Lnnp/images/20686848-1.7+473%20--pools%20+,+.png
1.7+473 Without --pools
"avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pmode --output-depth 10 --output "test.hevc" "test.avs"
https://bitbucket.org/repo/49Lnnp/images/2785897438-1.7+473%20no%20--pools.png
-
Account Deactivated Another patch to fix the difference between the --pools +,+ and --pools 24,24: https://patches.videolan.org/patch/10059/. With this patch, these two command lines should behave the same. I've also improved the log a bit so that it is clear as to what nodes the various pools are being allocated.
Now, with regards to the --pools 24,24 being faster than not specifying the --pools, it isn't all that alarming albeit a little surprising. We empirically found by evaluating on multiple Xeon Ivybridge/Haswell Xeon servers and videos that using a single big pool with 2X the threads is more beneficial than using multiple pools with half the threads and hence moved the default to this setting. Your input & command line combo seems to go the other way. I would recommend just explicitly specifying the --pools 24,24 on the command line after all these patches are applied so that you go back to the default behavior that seems to work better for your test case.
-
reporter version 1.7+481
The test use a 5min 24fps 1080p video
Seems the answer of "--pools 24,24 being faster than not specifying the --pools" seems to be right in My PC.
with "--pools +,+" or no "--pools"
"avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pmode --pools +,+ --output-depth 10 --output "test.hevc" "test.avs"
yuv [info]: 1920x1080 fps 24000/1001 i420p8 unknown frame count raw [info]: output file: F:\test.hevc x265 [info]: HEVC encoder version 1.7+481-046175f0d5ca x265 [info]: build info [Windows][MSVC 1900][64 bit] 10bit x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX x265 [info]: Main 10 profile, Level-5 (Main tier) x265 [info]: Thread pool 0 using 48 threads on numa nodes ,0,1 x265 [info]: frame threads / pool features : 6 / wpp(34 rows)+pmode x265 [info]: Coding QT: max CU size, min CU size : 32 / 8 x265 [info]: Residual QT: max TU size, max depth : 16 / 3 inter / 3 intra x265 [info]: ME / range / subpel / merge : star / 44 / 5 / 4 x265 [info]: Keyframe min / max / scenecut : 1 / 720 / 40 x265 [info]: Intra 32x32 TU penalty type : 2 x265 [info]: Lookahead / bframes / badapt : 72 / 9 / 2 x265 [info]: b-pyramid / weightp / weightb : 1 / 1 / 1 x265 [info]: References / ref-limit cu / depth : 5 / 0 / 0 x265 [info]: AQ: mode / str / qg-size / cu-tree : 1 / 1.1 / 32 / 1 x265 [info]: Rate Control / qCompress : CRF-17.0 / 0.78 x265 [info]: tools: rd=6 psy-rd=1.20 rdoq=2 psy-rdoq=14.00 signhide tmvp x265 [info]: tools: b-intra deblock(tC=-2:B=-2) x265 [info]: frame I: 10, Avg QP:12.96 kb/s: 35506.36 x265 [info]: frame P: 1716, Avg QP:14.81 kb/s: 13039.52 x265 [info]: frame B: 5474, Avg QP:20.94 kb/s: 1475.69 x265 [info]: Weighted P-Frames: Y:4.8% UV:4.2% x265 [info]: Weighted B-Frames: Y:3.0% UV:2.2% x265 [info]: consecutive B-frames: 14.5% 6.0% 22.1% 18.8% 7.2% 20.8% 2.0% 3.5% 4.0% 1.0% encoded 7200 frames in 1633.21s (4.41 fps), 4279.00 kb/s, Avg QP:19.46
with "--pools 24,24"
"avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pmode --pools 24,24 --output-depth 10 --output "test.hevc" "test.avs"
yuv [info]: 1920x1080 fps 24000/1001 i420p8 unknown frame count raw [info]: output file: F:\test.hevc x265 [info]: HEVC encoder version 1.7+481-046175f0d5ca x265 [info]: build info [Windows][MSVC 1900][64 bit] 10bit x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX x265 [info]: Main 10 profile, Level-5 (Main tier) x265 [info]: Thread pool 0 using 24 threads on numa nodes ,0 x265 [info]: Thread pool 1 using 24 threads on numa nodes ,1 x265 [info]: frame threads / pool features : 6 / wpp(34 rows)+pmode x265 [info]: Coding QT: max CU size, min CU size : 32 / 8 x265 [info]: Residual QT: max TU size, max depth : 16 / 3 inter / 3 intra x265 [info]: ME / range / subpel / merge : star / 44 / 5 / 4 x265 [info]: Keyframe min / max / scenecut : 1 / 720 / 40 x265 [info]: Intra 32x32 TU penalty type : 2 x265 [info]: Lookahead / bframes / badapt : 72 / 9 / 2 x265 [info]: b-pyramid / weightp / weightb : 1 / 1 / 1 x265 [info]: References / ref-limit cu / depth : 5 / 0 / 0 x265 [info]: AQ: mode / str / qg-size / cu-tree : 1 / 1.1 / 32 / 1 x265 [info]: Rate Control / qCompress : CRF-17.0 / 0.78 x265 [info]: tools: rd=6 psy-rd=1.20 rdoq=2 psy-rdoq=14.00 signhide tmvp x265 [info]: tools: b-intra deblock(tC=-2:B=-2) x265 [info]: frame I: 10, Avg QP:12.96 kb/s: 35506.36 x265 [info]: frame P: 1716, Avg QP:14.81 kb/s: 13039.52 x265 [info]: frame B: 5474, Avg QP:20.94 kb/s: 1475.69 x265 [info]: Weighted P-Frames: Y:4.8% UV:4.2% x265 [info]: Weighted B-Frames: Y:3.0% UV:2.2% x265 [info]: consecutive B-frames: 14.5% 6.0% 22.1% 18.8% 7.2% 20.8% 2.0% 3.5% 4.0% 1.0% encoded 7200 frames in 1440.81s (5.00 fps), 4279.00 kb/s, Avg QP:19.46
with "--pools +,+" or no "--pools",without the command “--pmode”
"avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pools +,+ --output-depth 10 --output "test.hevc" "test.avs"
yuv [info]: 1920x1080 fps 24000/1001 i420p8 unknown frame count raw [info]: output file: F:\test.hevc x265 [info]: HEVC encoder version 1.7+481-046175f0d5ca x265 [info]: build info [Windows][MSVC 1900][64 bit] 10bit x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX x265 [info]: Main 10 profile, Level-5 (Main tier) x265 [info]: Thread pool 0 using 48 threads on numa nodes ,0,1 x265 [info]: frame threads / pool features : 6 / wpp(34 rows) x265 [info]: Coding QT: max CU size, min CU size : 32 / 8 x265 [info]: Residual QT: max TU size, max depth : 16 / 3 inter / 3 intra x265 [info]: ME / range / subpel / merge : star / 44 / 5 / 4 x265 [info]: Keyframe min / max / scenecut : 1 / 720 / 40 x265 [info]: Intra 32x32 TU penalty type : 2 x265 [info]: Lookahead / bframes / badapt : 72 / 9 / 2 x265 [info]: b-pyramid / weightp / weightb : 1 / 1 / 1 x265 [info]: References / ref-limit cu / depth : 5 / 0 / 0 x265 [info]: AQ: mode / str / qg-size / cu-tree : 1 / 1.1 / 32 / 1 x265 [info]: Rate Control / qCompress : CRF-17.0 / 0.78 x265 [info]: tools: rd=6 psy-rd=1.20 rdoq=2 psy-rdoq=14.00 signhide tmvp x265 [info]: tools: b-intra deblock(tC=-2:B=-2) x265 [info]: frame I: 10, Avg QP:12.96 kb/s: 35506.36 x265 [info]: frame P: 1716, Avg QP:14.81 kb/s: 13039.52 x265 [info]: frame B: 5474, Avg QP:20.94 kb/s: 1475.69 x265 [info]: Weighted P-Frames: Y:4.8% UV:4.2% x265 [info]: Weighted B-Frames: Y:3.0% UV:2.2% x265 [info]: consecutive B-frames: 14.5% 6.0% 22.1% 18.8% 7.2% 20.8% 2.0% 3.5% 4.0% 1.0% encoded 7200 frames in 1728.21s (4.17 fps), 4279.00 kb/s, Avg QP:19.46
with "--pools 24,24",without the command “--pmode”
"avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pools 24,24 --output-depth 10 --output "test.hevc" "test.avs"
yuv [info]: 1920x1080 fps 24000/1001 i420p8 unknown frame count raw [info]: output file: F:\test.hevc x265 [info]: HEVC encoder version 1.7+481-046175f0d5ca x265 [info]: build info [Windows][MSVC 1900][64 bit] 10bit x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX x265 [info]: Main 10 profile, Level-5 (Main tier) x265 [info]: Thread pool 0 using 24 threads on numa nodes ,0 x265 [info]: Thread pool 1 using 24 threads on numa nodes ,1 x265 [info]: frame threads / pool features : 6 / wpp(34 rows) x265 [info]: Coding QT: max CU size, min CU size : 32 / 8 x265 [info]: Residual QT: max TU size, max depth : 16 / 3 inter / 3 intra x265 [info]: ME / range / subpel / merge : star / 44 / 5 / 4 x265 [info]: Keyframe min / max / scenecut : 1 / 720 / 40 x265 [info]: Intra 32x32 TU penalty type : 2 x265 [info]: Lookahead / bframes / badapt : 72 / 9 / 2 x265 [info]: b-pyramid / weightp / weightb : 1 / 1 / 1 x265 [info]: References / ref-limit cu / depth : 5 / 0 / 0 x265 [info]: AQ: mode / str / qg-size / cu-tree : 1 / 1.1 / 32 / 1 x265 [info]: Rate Control / qCompress : CRF-17.0 / 0.78 x265 [info]: tools: rd=6 psy-rd=1.20 rdoq=2 psy-rdoq=14.00 signhide tmvp x265 [info]: tools: b-intra deblock(tC=-2:B=-2) x265 [info]: frame I: 10, Avg QP:12.96 kb/s: 35506.36 x265 [info]: frame P: 1716, Avg QP:14.81 kb/s: 13039.52 x265 [info]: frame B: 5474, Avg QP:20.94 kb/s: 1475.69 x265 [info]: Weighted P-Frames: Y:4.8% UV:4.2% x265 [info]: Weighted B-Frames: Y:3.0% UV:2.2% x265 [info]: consecutive B-frames: 14.5% 6.0% 22.1% 18.8% 7.2% 20.8% 2.0% 3.5% 4.0% 1.0% encoded 7200 frames in 1439.21s (5.00 fps), 4279.00 kb/s, Avg QP:19.46
-
Account Deactivated Did you try without the --pmode option? What did you see?
-
This is mostly a function of utilization. With pmode, the utilization is fairly heavy, so you could expect better performance from 2 separate threadpools.
-
reporter test by the 1.7+497(commit:975352b2c0223b9139aad233b43eaf2113ac8167).
with "--pools +,+" or no "--pools",without the command “--pmode”
"avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pools +,+ --output-depth 10 --output "test.hevc" "test.avs"
yuv [info]: 1920x1080 fps 24000/1001 i420p8 unknown frame count raw [info]: output file: F:\[philosophy-raws][Phi Brain:Puzzle of God]\test.hevc x265 [info]: HEVC encoder version 1.7+497-975352b2c022 x265 [info]: build info [Windows][MSVC 1900][64 bit] 10bit x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX x265 [info]: Main 10 profile, Level-5 (Main tier) x265 [info]: Thread pool 0 using 48 threads on numa nodes 0,1 x265 [info]: frame threads / pool features : 6 / wpp(34 rows) x265 [info]: Coding QT: max CU size, min CU size : 32 / 8 x265 [info]: Residual QT: max TU size, max depth : 16 / 3 inter / 3 intra x265 [info]: ME / range / subpel / merge : star / 44 / 5 / 4 x265 [info]: Keyframe min / max / scenecut : 1 / 720 / 40 x265 [info]: Intra 32x32 TU penalty type : 2 x265 [info]: Lookahead / bframes / badapt : 72 / 9 / 2 x265 [info]: b-pyramid / weightp / weightb : 1 / 1 / 1 x265 [info]: References / ref-limit cu / depth : 5 / 0 / 0 x265 [info]: AQ: mode / str / qg-size / cu-tree : 1 / 1.1 / 32 / 1 x265 [info]: Rate Control / qCompress : CRF-17.0 / 0.78 x265 [info]: tools: rd=6 psy-rd=1.20 rdoq=2 psy-rdoq=14.00 signhide tmvp x265 [info]: tools: b-intra deblock(tC=-2:B=-2) x265 [info]: frame I: 10, Avg QP:12.96 kb/s: 35515.63 x265 [info]: frame P: 1707, Avg QP:14.79 kb/s: 13179.50 x265 [info]: frame B: 5483, Avg QP:20.94 kb/s: 1470.32 x265 [info]: Weighted P-Frames: Y:4.9% UV:4.3% x265 [info]: Weighted B-Frames: Y:2.9% UV:2.1% x265 [info]: consecutive B-frames: 14.4% 5.9% 21.7% 18.8% 7.2% 21.2% 2.2% 3.6% 4.0% 1.0% encoded 7200 frames in 1736.41s (4.15 fps), 4293.66 kb/s, Avg QP:19.47
with "--pools 24,24",without the command “--pmode”
"avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pools 24,24 --output-depth 10 --output "test.hevc" "test.avs"
yuv [info]: 1920x1080 fps 24000/1001 i420p8 unknown frame count raw [info]: output file: F:\[philosophy-raws][Phi Brain:Puzzle of God]\test.hevc x265 [info]: HEVC encoder version 1.7+497-975352b2c022 x265 [info]: build info [Windows][MSVC 1900][64 bit] 10bit x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX x265 [info]: Main 10 profile, Level-5 (Main tier) x265 [info]: Thread pool 0 using 24 threads on numa nodes 0 x265 [info]: Thread pool 1 using 24 threads on numa nodes 1 x265 [info]: frame threads / pool features : 6 / wpp(34 rows) x265 [info]: Coding QT: max CU size, min CU size : 32 / 8 x265 [info]: Residual QT: max TU size, max depth : 16 / 3 inter / 3 intra x265 [info]: ME / range / subpel / merge : star / 44 / 5 / 4 x265 [info]: Keyframe min / max / scenecut : 1 / 720 / 40 x265 [info]: Intra 32x32 TU penalty type : 2 x265 [info]: Lookahead / bframes / badapt : 72 / 9 / 2 x265 [info]: b-pyramid / weightp / weightb : 1 / 1 / 1 x265 [info]: References / ref-limit cu / depth : 5 / 0 / 0 x265 [info]: AQ: mode / str / qg-size / cu-tree : 1 / 1.1 / 32 / 1 x265 [info]: Rate Control / qCompress : CRF-17.0 / 0.78 x265 [info]: tools: rd=6 psy-rd=1.20 rdoq=2 psy-rdoq=14.00 signhide tmvp x265 [info]: tools: b-intra deblock(tC=-2:B=-2) x265 [info]: frame I: 10, Avg QP:12.96 kb/s: 35515.63 x265 [info]: frame P: 1707, Avg QP:14.79 kb/s: 13179.50 x265 [info]: frame B: 5483, Avg QP:20.94 kb/s: 1470.32 x265 [info]: Weighted P-Frames: Y:4.9% UV:4.3% x265 [info]: Weighted B-Frames: Y:2.9% UV:2.1% x265 [info]: consecutive B-frames: 14.4% 5.9% 21.7% 18.8% 7.2% 21.2% 2.2% 3.6% 4.0% 1.0% encoded 7200 frames in 1468.07s (4.90 fps), 4293.66 kb/s, Avg QP:19.47
-
Account Deactivated Maybe this is something to do with your video/machine. I just tried encoding with and without pmode on a few dual-socket linux machines that I have access to, and for both 1080p and 4K videos in a variety of ABR settings, I see that using a unified pool (by not specifying --pools 32,32 or the likes) is better than using split pools. I am now trying with CRF settings; will respond once I have that data.
In the meanwhile, can you share a link to the video that you're using here? Maybe there is something in that video that is causing such a huge difference...
-
Account Deactivated I tried 10-bit encoding with the exact command line above but with my 1080p videos - the results are identical with the unified pool (not specifying any --pools options) being better than split pools for both pmode on and off. I would like to try next with your video - can you please send me a pointer?
For now, you can just add the --pools 24,24 to your command line as it is clearly much faster for you.
-
reporter Should I close this issue?
-
Account Deactivated I would still like to try your video clip out to see if there is an artifact that I'm missing - do you have a public link for the video?
-
reporter Chage to a new Video.
Make a new test with lower Crf.
with "--pools +,+" or no "--pools",without the command “--pmode”
"D:\MeGUI_x86\tools\x265\avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 15.5 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 6 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.8 --no-strong-intra-smoothing --psy-rdoq 4.0 --psy-rd 0.9 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --qpstep 5 --ctu 32 --max-tu-size 16 --rdpenalty 2 --colormatrix bt709 --pools +,+ --output-depth 10 --output "F:[philosophy-raws][Phi Brain:Puzzle of God]\test.hevc" "F:[philosophy-raws][Phi Brain:Puzzle of God]\test.avs"
It's 1.85 FPS.
with "--pools 24,24",without the command “--pmode”
"D:\MeGUI_x86\tools\x265\avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 15.5 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 6 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.8 --no-strong-intra-smoothing --psy-rdoq 4.0 --psy-rd 0.9 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --qpstep 5 --ctu 32 --max-tu-size 16 --rdpenalty 2 --colormatrix bt709 --pools 24,24 --output-depth 10 --output "F:[philosophy-raws][Phi Brain:Puzzle of God]\test.hevc" "F:[philosophy-raws][Phi Brain:Puzzle of God]\test.avs"
In the test It's 2.14 FPS
Test Clip:https://mega.nz/#!IcwVFQwL!9TscOdHH2Bkhv8DMuu7IWsVojCwfyFZeUEriPY8P3cY
-
Account Deactivated Thanks for the clip. I ran the exact command line above on my linux box that has two sockets of E5-2699v3 each (36 threads per socket) and on a windows box that has two sockets of E5-2650v2 each (16 threads per socket) and I see not specifying the --pools option to be marginally better in performance than specifying it.
So I am bound to say that your best fix is to just include a --pools in your command line. This should ensure forward progress for you at no loss in performance :-).
However, the curios engineer in me isn't satisfied so I'd like to dig further! I notice that my FPSes are lower than what you see above (~1.3 on the linux box, and ~0.6 on the windows box). I'm not sure if that plays into the differences between what I'm seeing and what you're reporting (maybe the balance between compute and memory accesses is very different across the machines.) If you can share that information, can you give me your hardware configuration to see if I or someone else can chip-in to find the problem out?
Pradeep.
-
reporter For me
I usually encode with 2 threads with --pools -,+ and --pools +,-。
That's the fastest way for me.
It seems that this issue could be closed.
-
- changed status to resolved
- Log in to comment
Is your's a 32-bit compile, or 64-bit and is this windows or linux? How many threads do you see on NUMA node 0? Can you try to add the --pools 24,24 to the command line and see if it fixes the problem?
Pradeep.