Still have issue in commit: 365f7ed

Issue #185 resolved
changhao fu created an issue

My CPU are E5 2595 V2 x2 in 24threads per CPU

When I using like:

--preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pools -,+ --pmode --output-depth 10

x265 Only use 6 threads of the NUMA node 1.

But when I back to the commit:1e61810 it worked fine,all 24 threads in NUMA node 1 worked fine

Comments (30)

  1. Pradeep Ramachandran Account Deactivated

    Is your's a 32-bit compile, or 64-bit and is this windows or linux? How many threads do you see on NUMA node 0? Can you try to add the --pools 24,24 to the command line and see if it fixes the problem?

    Pradeep.

  2. changhao fu reporter

    In NUMA node 0 24 threads worked fine.

    I tired --pools +,- or --pools 24,- This Worked fine.

    But sth like --pools -,+ or --pools -,24 Only the first of the six threads in NUMA node 1 worked.

  3. Pradeep Ramachandran Account Deactivated

    Hmm.. this is strange. I've tried this on a 64-bit linux system and it worked just fine. I would like to replicate your problem to fix it - can you please specify if you're running windows/linux with 32- or 64-bit build? And what is your command line/

    Pradeep

  4. changhao fu reporter

    windows server 2012 R2. 64bit system.

    "D:\MeGUI_x86\tools\x265\avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pools -,+ --output-depth 10 --output "test.hevc" "test.avs"

    I tested with --output-depth 10 --pmode --output-depth 10 --output-depth 12 --pmode --output-depth 12

    All happened in the same way.

    yuv  [info]: 1920x1080 fps 24000/1001 i420p10 unknown frame count
    raw  [info]: output file: F:\test.hevc
    x265 [info]: HEVC encoder version 1.7+478-365f7ed4d896
    x265 [info]: build info [Windows][MSVC 1900][64 bit] 10bit
    x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX
    x265 [info]: Main 10 profile, Level-5 (Main tier)
    x265 [info]: Thread pool 0 using 24 threads with NUMA node mask 2
    x265 [info]: frame threads / pool features       : 6 / wpp(34 rows)
    x265 [info]: Coding QT: max CU size, min CU size : 32 / 8
    x265 [info]: Residual QT: max TU size, max depth : 16 / 3 inter / 3 intra
    x265 [info]: ME / range / subpel / merge         : star / 44 / 5 / 4
    x265 [info]: Keyframe min / max / scenecut       : 1 / 720 / 40
    x265 [info]: Intra 32x32 TU penalty type         : 2
    x265 [info]: Lookahead / bframes / badapt        : 72 / 9 / 2
    x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 1
    x265 [info]: References / ref-limit  cu / depth  : 5 / 0 / 0
    x265 [info]: AQ: mode / str / qg-size / cu-tree  : 1 / 1.1 / 32 / 1
    x265 [info]: Rate Control / qCompress            : CRF-17.0 / 0.78
    x265 [info]: tools: rd=6 psy-rd=1.20 rdoq=2 psy-rdoq=14.00 signhide tmvp
    x265 [info]: tools: b-intra deblock(tC=-2:B=-2)
    

    compile with GCC 5.2.0+libmsvc120.a still have this problem.

    yuv  [info]: 1920x1080 fps 24000/1001 i420p12 unknown frame count
    raw  [info]: output file: test.hevc
    x265 [info]: HEVC encoder version 1.7+478-365f7ed4d896
    x265 [info]: build info [Windows][GCC 5.2.0][64 bit] 12bit
    x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX
    x265 [info]: Main 12 profile, Level-5 (Main tier)
    x265 [info]: Thread pool 0 using 24 threads with NUMA node mask 2
    x265 [info]: frame threads / pool features       : 6 / wpp(34 rows)
    x265 [info]: Coding QT: max CU size, min CU size : 32 / 8
    x265 [info]: Residual QT: max TU size, max depth : 16 / 3 inter / 3 intra
    x265 [info]: ME / range / subpel / merge         : star / 44 / 5 / 4
    x265 [info]: Keyframe min / max / scenecut       : 1 / 720 / 40
    x265 [info]: Intra 32x32 TU penalty type         : 2
    x265 [info]: Lookahead / bframes / badapt        : 72 / 9 / 2
    x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 1
    x265 [info]: References / ref-limit  cu / depth  : 5 / 0 / 0
    x265 [info]: AQ: mode / str / qg-size / cu-tree  : 1 / 1.1 / 32 / 1
    x265 [info]: Rate Control / qCompress            : CRF-17.0 / 0.78
    x265 [info]: tools: rd=6 psy-rd=1.20 rdoq=2 psy-rdoq=14.00 signhide tmvp
    x265 [info]: tools: b-intra deblock(tC=-2:B=-2)
    
  5. changhao fu reporter

    Title:Still have issue in commit: 365f7ed

    without commit 365f7ed4d896.

    It would use only one threads when encoding.

    with commit 365f7ed4d896.

    It only use the full threads in numa node 0.

    And only threads 1-6 in numa node 1.

    threads 17-24 in numa node 1 still keep unused in x265.

  6. Pradeep Ramachandran Account Deactivated

    Changhao, I know the reason behind the problem.

    From the screen shot of your log above, I can see with the option of --pools -,+, we create one threadpool with 24 threads all of which go to NUMA node 1 (in the log, it says x265 [info]: Thread pool 0 using 24 threads with NUMA node mask 2 which is NUMA node 1 only). So when using only one pool things work ok.

    Now, when you don't specify any --pools option, you want to create 48 threads in all. Assuming that you have a 32-bit compile, 48%32 = 8 which is < 32/2. Based on the heuristic in the commit message of 365f7ed4d896 https://bitbucket.org/multicoreware/x265/commits/365f7ed4d896, we cap such that the total # threads will be 32. I don't think this heuristic is required anymore as we don't split into multiple pools unless specified. I will send a patch for this in a bit and you can test it out to see if it solves your problem.

  7. Pradeep Ramachandran Account Deactivated

    Can you see if the attached patch helps? It clips the # threads only if the last pool has <1/4th max threads which shouldn't be the case for you.

  8. changhao fu reporter

    The problem still remained.

    In my tests

    1.7+479(leatest commit 365f7ed4d896 plus this pactch:https://patches.videolan.org/patch/10023/)

    no --pools

    "avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pmode --output-depth 10 --output "test.hevc" "test.avs"

    https://bitbucket.org/repo/49Lnnp/images/2448750373-1.7+479%20no%20--pools.png

    With --pools +,-

    "avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pmode --pools +,- --output-depth 10 --output "test.hevc" "test.avs"

    https://bitbucket.org/repo/49Lnnp/images/872375026-1.7+479%20--pools+,-.png

    With --pools -,+

    "avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pmode --pools -,+ --output-depth 10 --output "test.hevc" "test.avs"

    https://bitbucket.org/repo/49Lnnp/images/4150634568-1.7+479%20--pools%20-,+.png

    roll back to 1.7+473(commit 1e6181090f1df6790d5914c18be4272aa1b54f69)

    no --pools

    "avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pmode --output-depth 10 --output "test.hevc" "test.avs"

    https://bitbucket.org/repo/49Lnnp/images/2785897438-1.7+473%20no%20--pools.png

    With --pools +,-

    "avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pmode --pools +,- --output-depth 10 --output "test.hevc" "test.avs"

    https://bitbucket.org/repo/49Lnnp/images/2136934142-1.7+473%20--pools+,-.png

    With --pools -,+

    "avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pmode --pools -,+ --output-depth 10 --output "test.hevc" "test.avs"

    https://bitbucket.org/repo/49Lnnp/images/3457745820-1.7+473%20--pools-,+.png

  9. Pradeep Ramachandran Account Deactivated

    I'm not able to see the screenshots - can you just paste them in the message directly like you'd done earlier?

  10. changhao fu reporter

    OK Changed photo link to bitbucket.org.

    This should worked.

    PS:I need to show the picture of the CPU threads.

  11. Pradeep Ramachandran Account Deactivated

    Thanks - I'm able to see the snapshots now.

    With the command --pools -,+, your snapshot shows that 24 threads were indeed launched on NUMA node 1 with the new commit. You'd previously reported that this wasn't the case - can you please confirm?

    Now, without the --pools command, we see that there are indeed 48 threads launched across both NUMA nodes (from the console log), but in your snapshot, only 32 threads are active. I wonder if this is just to do with the overall change in the execution profile. Do you see a performance improvement with 1.7+149 commit (with new patch) over 1.7+473 commit? If not, can you try adding a --pools 24,24 to your command line explicitly and seeing if that helps performance?

  12. changhao fu reporter

    I am so sorry that I upload the wrong snapshots.

    1.7+479 --pools -,+ and no --pools photo should be exchanged by each other.

    I have already fixed it

    1.7+479 With --pools 24,24

    "avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pmode --pools 24,24 --output-depth 10 --output "test.hevc" "test.avs"

    https://bitbucket.org/repo/49Lnnp/images/2106491959-1.7+479%20--pools%2024,24.png

    1.7+473 With --pools 24,24

    "avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pmode --pools 24,24 --output-depth 10 --output "test.hevc" "test.avs"

    https://bitbucket.org/repo/49Lnnp/images/523583665-1.7+473%20--pools%2024,24.png

    Still When I set sth like.

    without the --pools command
    --pools -,+
    --pools +,+
    --pools -,24
    --pools 24,24

    from the console log x265 says that all 48/24 threads are launched.

    But x265 only 32/8 threads are active.12 of the threads in CPU1 I couldn't use it in 1.7+479.

  13. changhao fu reporter

    1.7+480(add both https://patches.videolan.org/patch/10054/ and https://patches.videolan.org/patch/10023/)

    Worked fine in

    --pools -,+ --pools +,- --pools 24,24

    But strange happened in

    --pools +,+ without the --pools command

    When use the command "--pools +,+" it would say "Thread pool 0 using 48 threads with NUMA node mask 2" and only use 24 threads of CPU1.No CPU0 threads are used by x265.

    "avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pmode --pools +,+ --output-depth 10 --output "test.hevc" "test.avs"

    https://bitbucket.org/repo/49Lnnp/images/3753691839-1.7+480%20--pools%20+,+.png

    When not use the command "--pools" it would say "Thread pool 0 using 48 threads with NUMA node mask 3" and x265 use 24 threads of CPU1 but only part of the threads in CPU0. Also the speed is slower than the "--pools 24,24 in 1.7+480" or the same “no --pools in 1.7+473”.

    "--pools 24,24" 2.3-2.6 FPS "no --pools" 2.0-2.2 FPS

    "avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pmode --output-depth 10 --output "test.hevc" "test.avs"

    https://bitbucket.org/repo/49Lnnp/images/2711194435-1.7+480%20no%20--pools.png

    Some snapshots of the other options that worked fine

    1.7+473 With --pools 24,24

    "avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pmode --pools 24,24 --output-depth 10 --output "test.hevc" "test.avs"

    https://bitbucket.org/repo/49Lnnp/images/523583665-1.7+473%20--pools%2024,24.png

    1.7+480 With --pools 24,24

    "avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pmode --pools 24,24 --output-depth 10 --output "test.hevc" "test.avs"

    https://bitbucket.org/repo/49Lnnp/images/3845561199-1.7+480%20--pools%2024,24.png

    1.7+473 With --pools +,+

    "avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pmode --pools +,+ --output-depth 10 --output "test.hevc" "test.avs"

    https://bitbucket.org/repo/49Lnnp/images/20686848-1.7+473%20--pools%20+,+.png

    1.7+473 Without --pools

    "avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pmode --output-depth 10 --output "test.hevc" "test.avs"

    https://bitbucket.org/repo/49Lnnp/images/2785897438-1.7+473%20no%20--pools.png

  14. Pradeep Ramachandran Account Deactivated

    Another patch to fix the difference between the --pools +,+ and --pools 24,24: https://patches.videolan.org/patch/10059/. With this patch, these two command lines should behave the same. I've also improved the log a bit so that it is clear as to what nodes the various pools are being allocated.

    Now, with regards to the --pools 24,24 being faster than not specifying the --pools, it isn't all that alarming albeit a little surprising. We empirically found by evaluating on multiple Xeon Ivybridge/Haswell Xeon servers and videos that using a single big pool with 2X the threads is more beneficial than using multiple pools with half the threads and hence moved the default to this setting. Your input & command line combo seems to go the other way. I would recommend just explicitly specifying the --pools 24,24 on the command line after all these patches are applied so that you go back to the default behavior that seems to work better for your test case.

  15. changhao fu reporter

    version 1.7+481

    The test use a 5min 24fps 1080p video

    Seems the answer of "--pools 24,24 being faster than not specifying the --pools" seems to be right in My PC.

    with "--pools +,+" or no "--pools"

    "avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pmode --pools +,+ --output-depth 10 --output "test.hevc" "test.avs"

    yuv  [info]: 1920x1080 fps 24000/1001 i420p8 unknown frame count
    raw  [info]: output file: F:\test.hevc
    x265 [info]: HEVC encoder version 1.7+481-046175f0d5ca
    x265 [info]: build info [Windows][MSVC 1900][64 bit] 10bit
    x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX
    x265 [info]: Main 10 profile, Level-5 (Main tier)
    x265 [info]: Thread pool 0 using 48 threads on numa nodes ,0,1
    x265 [info]: frame threads / pool features       : 6 / wpp(34 rows)+pmode
    x265 [info]: Coding QT: max CU size, min CU size : 32 / 8
    x265 [info]: Residual QT: max TU size, max depth : 16 / 3 inter / 3 intra
    x265 [info]: ME / range / subpel / merge         : star / 44 / 5 / 4
    x265 [info]: Keyframe min / max / scenecut       : 1 / 720 / 40
    x265 [info]: Intra 32x32 TU penalty type         : 2
    x265 [info]: Lookahead / bframes / badapt        : 72 / 9 / 2
    x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 1
    x265 [info]: References / ref-limit  cu / depth  : 5 / 0 / 0
    x265 [info]: AQ: mode / str / qg-size / cu-tree  : 1 / 1.1 / 32 / 1
    x265 [info]: Rate Control / qCompress            : CRF-17.0 / 0.78
    x265 [info]: tools: rd=6 psy-rd=1.20 rdoq=2 psy-rdoq=14.00 signhide tmvp
    x265 [info]: tools: b-intra deblock(tC=-2:B=-2)
    x265 [info]: frame I:     10, Avg QP:12.96  kb/s: 35506.36
    x265 [info]: frame P:   1716, Avg QP:14.81  kb/s: 13039.52
    x265 [info]: frame B:   5474, Avg QP:20.94  kb/s: 1475.69
    x265 [info]: Weighted P-Frames: Y:4.8% UV:4.2%
    x265 [info]: Weighted B-Frames: Y:3.0% UV:2.2%
    x265 [info]: consecutive B-frames: 14.5% 6.0% 22.1% 18.8% 7.2% 20.8% 2.0% 3.5% 4.0% 1.0%
    
    encoded 7200 frames in 1633.21s (4.41 fps), 4279.00 kb/s, Avg QP:19.46
    

    with "--pools 24,24"

    "avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pmode --pools 24,24 --output-depth 10 --output "test.hevc" "test.avs"

    yuv  [info]: 1920x1080 fps 24000/1001 i420p8 unknown frame count
    raw  [info]: output file: F:\test.hevc
    x265 [info]: HEVC encoder version 1.7+481-046175f0d5ca
    x265 [info]: build info [Windows][MSVC 1900][64 bit] 10bit
    x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX
    x265 [info]: Main 10 profile, Level-5 (Main tier)
    x265 [info]: Thread pool 0 using 24 threads on numa nodes ,0
    x265 [info]: Thread pool 1 using 24 threads on numa nodes ,1
    x265 [info]: frame threads / pool features       : 6 / wpp(34 rows)+pmode
    x265 [info]: Coding QT: max CU size, min CU size : 32 / 8
    x265 [info]: Residual QT: max TU size, max depth : 16 / 3 inter / 3 intra
    x265 [info]: ME / range / subpel / merge         : star / 44 / 5 / 4
    x265 [info]: Keyframe min / max / scenecut       : 1 / 720 / 40
    x265 [info]: Intra 32x32 TU penalty type         : 2
    x265 [info]: Lookahead / bframes / badapt        : 72 / 9 / 2
    x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 1
    x265 [info]: References / ref-limit  cu / depth  : 5 / 0 / 0
    x265 [info]: AQ: mode / str / qg-size / cu-tree  : 1 / 1.1 / 32 / 1
    x265 [info]: Rate Control / qCompress            : CRF-17.0 / 0.78
    x265 [info]: tools: rd=6 psy-rd=1.20 rdoq=2 psy-rdoq=14.00 signhide tmvp
    x265 [info]: tools: b-intra deblock(tC=-2:B=-2)
    x265 [info]: frame I:     10, Avg QP:12.96  kb/s: 35506.36
    x265 [info]: frame P:   1716, Avg QP:14.81  kb/s: 13039.52
    x265 [info]: frame B:   5474, Avg QP:20.94  kb/s: 1475.69
    x265 [info]: Weighted P-Frames: Y:4.8% UV:4.2%
    x265 [info]: Weighted B-Frames: Y:3.0% UV:2.2%
    x265 [info]: consecutive B-frames: 14.5% 6.0% 22.1% 18.8% 7.2% 20.8% 2.0% 3.5% 4.0% 1.0%
    
    encoded 7200 frames in 1440.81s (5.00 fps), 4279.00 kb/s, Avg QP:19.46
    

    with "--pools +,+" or no "--pools",without the command “--pmode”

    "avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pools +,+ --output-depth 10 --output "test.hevc" "test.avs"

    yuv  [info]: 1920x1080 fps 24000/1001 i420p8 unknown frame count
    raw  [info]: output file: F:\test.hevc
    x265 [info]: HEVC encoder version 1.7+481-046175f0d5ca
    x265 [info]: build info [Windows][MSVC 1900][64 bit] 10bit
    x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX
    x265 [info]: Main 10 profile, Level-5 (Main tier)
    x265 [info]: Thread pool 0 using 48 threads on numa nodes ,0,1
    x265 [info]: frame threads / pool features       : 6 / wpp(34 rows)
    x265 [info]: Coding QT: max CU size, min CU size : 32 / 8
    x265 [info]: Residual QT: max TU size, max depth : 16 / 3 inter / 3 intra
    x265 [info]: ME / range / subpel / merge         : star / 44 / 5 / 4
    x265 [info]: Keyframe min / max / scenecut       : 1 / 720 / 40
    x265 [info]: Intra 32x32 TU penalty type         : 2
    x265 [info]: Lookahead / bframes / badapt        : 72 / 9 / 2
    x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 1
    x265 [info]: References / ref-limit  cu / depth  : 5 / 0 / 0
    x265 [info]: AQ: mode / str / qg-size / cu-tree  : 1 / 1.1 / 32 / 1
    x265 [info]: Rate Control / qCompress            : CRF-17.0 / 0.78
    x265 [info]: tools: rd=6 psy-rd=1.20 rdoq=2 psy-rdoq=14.00 signhide tmvp
    x265 [info]: tools: b-intra deblock(tC=-2:B=-2)
    x265 [info]: frame I:     10, Avg QP:12.96  kb/s: 35506.36
    x265 [info]: frame P:   1716, Avg QP:14.81  kb/s: 13039.52
    x265 [info]: frame B:   5474, Avg QP:20.94  kb/s: 1475.69
    x265 [info]: Weighted P-Frames: Y:4.8% UV:4.2%
    x265 [info]: Weighted B-Frames: Y:3.0% UV:2.2%
    x265 [info]: consecutive B-frames: 14.5% 6.0% 22.1% 18.8% 7.2% 20.8% 2.0% 3.5% 4.0% 1.0%
    
    encoded 7200 frames in 1728.21s (4.17 fps), 4279.00 kb/s, Avg QP:19.46
    

    with "--pools 24,24",without the command “--pmode”

    "avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pools 24,24 --output-depth 10 --output "test.hevc" "test.avs"

    yuv  [info]: 1920x1080 fps 24000/1001 i420p8 unknown frame count
    raw  [info]: output file: F:\test.hevc
    x265 [info]: HEVC encoder version 1.7+481-046175f0d5ca
    x265 [info]: build info [Windows][MSVC 1900][64 bit] 10bit
    x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX
    x265 [info]: Main 10 profile, Level-5 (Main tier)
    x265 [info]: Thread pool 0 using 24 threads on numa nodes ,0
    x265 [info]: Thread pool 1 using 24 threads on numa nodes ,1
    x265 [info]: frame threads / pool features       : 6 / wpp(34 rows)
    x265 [info]: Coding QT: max CU size, min CU size : 32 / 8
    x265 [info]: Residual QT: max TU size, max depth : 16 / 3 inter / 3 intra
    x265 [info]: ME / range / subpel / merge         : star / 44 / 5 / 4
    x265 [info]: Keyframe min / max / scenecut       : 1 / 720 / 40
    x265 [info]: Intra 32x32 TU penalty type         : 2
    x265 [info]: Lookahead / bframes / badapt        : 72 / 9 / 2
    x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 1
    x265 [info]: References / ref-limit  cu / depth  : 5 / 0 / 0
    x265 [info]: AQ: mode / str / qg-size / cu-tree  : 1 / 1.1 / 32 / 1
    x265 [info]: Rate Control / qCompress            : CRF-17.0 / 0.78
    x265 [info]: tools: rd=6 psy-rd=1.20 rdoq=2 psy-rdoq=14.00 signhide tmvp
    x265 [info]: tools: b-intra deblock(tC=-2:B=-2)
    x265 [info]: frame I:     10, Avg QP:12.96  kb/s: 35506.36
    x265 [info]: frame P:   1716, Avg QP:14.81  kb/s: 13039.52
    x265 [info]: frame B:   5474, Avg QP:20.94  kb/s: 1475.69
    x265 [info]: Weighted P-Frames: Y:4.8% UV:4.2%
    x265 [info]: Weighted B-Frames: Y:3.0% UV:2.2%
    x265 [info]: consecutive B-frames: 14.5% 6.0% 22.1% 18.8% 7.2% 20.8% 2.0% 3.5% 4.0% 1.0%
    
    encoded 7200 frames in 1439.21s (5.00 fps), 4279.00 kb/s, Avg QP:19.46
    
  16. Deepthi Nandakumar

    This is mostly a function of utilization. With pmode, the utilization is fairly heavy, so you could expect better performance from 2 separate threadpools.

  17. changhao fu reporter

    test by the 1.7+497(commit:975352b2c0223b9139aad233b43eaf2113ac8167).

    with "--pools +,+" or no "--pools",without the command “--pmode”

    "avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pools +,+ --output-depth 10 --output "test.hevc" "test.avs"

    yuv  [info]: 1920x1080 fps 24000/1001 i420p8 unknown frame count
    raw  [info]: output file: F:\[philosophy-raws][Phi Brain:Puzzle of God]\test.hevc
    x265 [info]: HEVC encoder version 1.7+497-975352b2c022
    x265 [info]: build info [Windows][MSVC 1900][64 bit] 10bit
    x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX
    x265 [info]: Main 10 profile, Level-5 (Main tier)
    x265 [info]: Thread pool 0 using 48 threads on numa nodes 0,1
    x265 [info]: frame threads / pool features       : 6 / wpp(34 rows)
    x265 [info]: Coding QT: max CU size, min CU size : 32 / 8
    x265 [info]: Residual QT: max TU size, max depth : 16 / 3 inter / 3 intra
    x265 [info]: ME / range / subpel / merge         : star / 44 / 5 / 4
    x265 [info]: Keyframe min / max / scenecut       : 1 / 720 / 40
    x265 [info]: Intra 32x32 TU penalty type         : 2
    x265 [info]: Lookahead / bframes / badapt        : 72 / 9 / 2
    x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 1
    x265 [info]: References / ref-limit  cu / depth  : 5 / 0 / 0
    x265 [info]: AQ: mode / str / qg-size / cu-tree  : 1 / 1.1 / 32 / 1
    x265 [info]: Rate Control / qCompress            : CRF-17.0 / 0.78
    x265 [info]: tools: rd=6 psy-rd=1.20 rdoq=2 psy-rdoq=14.00 signhide tmvp
    x265 [info]: tools: b-intra deblock(tC=-2:B=-2)
    x265 [info]: frame I:     10, Avg QP:12.96  kb/s: 35515.63
    x265 [info]: frame P:   1707, Avg QP:14.79  kb/s: 13179.50
    x265 [info]: frame B:   5483, Avg QP:20.94  kb/s: 1470.32
    x265 [info]: Weighted P-Frames: Y:4.9% UV:4.3%
    x265 [info]: Weighted B-Frames: Y:2.9% UV:2.1%
    x265 [info]: consecutive B-frames: 14.4% 5.9% 21.7% 18.8% 7.2% 21.2% 2.2% 3.6% 4.0% 1.0%
    
    encoded 7200 frames in 1736.41s (4.15 fps), 4293.66 kb/s, Avg QP:19.47
    

    with "--pools 24,24",without the command “--pmode”

    "avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 17.0 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.78 --no-strong-intra-smoothing --psy-rdoq 14.0 --psy-rd 1.2 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --ctu 32 --max-tu-size 16 --rdpenalty 2 --qpstep 5 --colormatrix bt709 --pools 24,24 --output-depth 10 --output "test.hevc" "test.avs"

    yuv  [info]: 1920x1080 fps 24000/1001 i420p8 unknown frame count
    raw  [info]: output file: F:\[philosophy-raws][Phi Brain:Puzzle of God]\test.hevc
    x265 [info]: HEVC encoder version 1.7+497-975352b2c022
    x265 [info]: build info [Windows][MSVC 1900][64 bit] 10bit
    x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX
    x265 [info]: Main 10 profile, Level-5 (Main tier)
    x265 [info]: Thread pool 0 using 24 threads on numa nodes 0
    x265 [info]: Thread pool 1 using 24 threads on numa nodes 1
    x265 [info]: frame threads / pool features       : 6 / wpp(34 rows)
    x265 [info]: Coding QT: max CU size, min CU size : 32 / 8
    x265 [info]: Residual QT: max TU size, max depth : 16 / 3 inter / 3 intra
    x265 [info]: ME / range / subpel / merge         : star / 44 / 5 / 4
    x265 [info]: Keyframe min / max / scenecut       : 1 / 720 / 40
    x265 [info]: Intra 32x32 TU penalty type         : 2
    x265 [info]: Lookahead / bframes / badapt        : 72 / 9 / 2
    x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 1
    x265 [info]: References / ref-limit  cu / depth  : 5 / 0 / 0
    x265 [info]: AQ: mode / str / qg-size / cu-tree  : 1 / 1.1 / 32 / 1
    x265 [info]: Rate Control / qCompress            : CRF-17.0 / 0.78
    x265 [info]: tools: rd=6 psy-rd=1.20 rdoq=2 psy-rdoq=14.00 signhide tmvp
    x265 [info]: tools: b-intra deblock(tC=-2:B=-2)
    x265 [info]: frame I:     10, Avg QP:12.96  kb/s: 35515.63
    x265 [info]: frame P:   1707, Avg QP:14.79  kb/s: 13179.50
    x265 [info]: frame B:   5483, Avg QP:20.94  kb/s: 1470.32
    x265 [info]: Weighted P-Frames: Y:4.9% UV:4.3%
    x265 [info]: Weighted B-Frames: Y:2.9% UV:2.1%
    x265 [info]: consecutive B-frames: 14.4% 5.9% 21.7% 18.8% 7.2% 21.2% 2.2% 3.6% 4.0% 1.0%
    
    encoded 7200 frames in 1468.07s (4.90 fps), 4293.66 kb/s, Avg QP:19.47
    
  18. Pradeep Ramachandran Account Deactivated

    Maybe this is something to do with your video/machine. I just tried encoding with and without pmode on a few dual-socket linux machines that I have access to, and for both 1080p and 4K videos in a variety of ABR settings, I see that using a unified pool (by not specifying --pools 32,32 or the likes) is better than using split pools. I am now trying with CRF settings; will respond once I have that data.

    In the meanwhile, can you share a link to the video that you're using here? Maybe there is something in that video that is causing such a huge difference...

  19. Pradeep Ramachandran Account Deactivated

    I tried 10-bit encoding with the exact command line above but with my 1080p videos - the results are identical with the unified pool (not specifying any --pools options) being better than split pools for both pmode on and off. I would like to try next with your video - can you please send me a pointer?

    For now, you can just add the --pools 24,24 to your command line as it is clearly much faster for you.

  20. Pradeep Ramachandran Account Deactivated

    I would still like to try your video clip out to see if there is an artifact that I'm missing - do you have a public link for the video?

  21. changhao fu reporter

    Chage to a new Video.

    Make a new test with lower Crf.

    with "--pools +,+" or no "--pools",without the command “--pmode”

    "D:\MeGUI_x86\tools\x265\avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 15.5 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 6 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.8 --no-strong-intra-smoothing --psy-rdoq 4.0 --psy-rd 0.9 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --qpstep 5 --ctu 32 --max-tu-size 16 --rdpenalty 2 --colormatrix bt709 --pools +,+ --output-depth 10 --output "F:[philosophy-raws][Phi Brain:Puzzle of God]\test.hevc" "F:[philosophy-raws][Phi Brain:Puzzle of God]\test.avs"

    It's 1.85 FPS.

    with "--pools 24,24",without the command “--pmode”

    "D:\MeGUI_x86\tools\x265\avs4x265.exe" --x265-binary "D:\Source code\x265\build\vc14-x86_64\x265.exe" --preset veryslow --crf 15.5 --tu-intra-depth 3 --tu-inter-depth 3 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 6 --weightb --keyint 720 --min-keyint 1 --bframes 9 --aq-mode 1 --aq-strength 1.1 --rd 6 --no-sao --no-open-gop --rc-lookahead 72 --scenecut 40 --max-merge 4 --qcomp 0.8 --no-strong-intra-smoothing --psy-rdoq 4.0 --psy-rd 0.9 --deblock -2:-2 --rdoq-level 2 --qg-size 32 --qpstep 5 --ctu 32 --max-tu-size 16 --rdpenalty 2 --colormatrix bt709 --pools 24,24 --output-depth 10 --output "F:[philosophy-raws][Phi Brain:Puzzle of God]\test.hevc" "F:[philosophy-raws][Phi Brain:Puzzle of God]\test.avs"

    In the test It's 2.14 FPS

    Test Clip:https://mega.nz/#!IcwVFQwL!9TscOdHH2Bkhv8DMuu7IWsVojCwfyFZeUEriPY8P3cY

  22. Pradeep Ramachandran Account Deactivated

    Thanks for the clip. I ran the exact command line above on my linux box that has two sockets of E5-2699v3 each (36 threads per socket) and on a windows box that has two sockets of E5-2650v2 each (16 threads per socket) and I see not specifying the --pools option to be marginally better in performance than specifying it.

    So I am bound to say that your best fix is to just include a --pools in your command line. This should ensure forward progress for you at no loss in performance :-).

    However, the curios engineer in me isn't satisfied so I'd like to dig further! I notice that my FPSes are lower than what you see above (~1.3 on the linux box, and ~0.6 on the windows box). I'm not sure if that plays into the differences between what I'm seeing and what you're reporting (maybe the balance between compute and memory accesses is very different across the machines.) If you can share that information, can you give me your hardware configuration to see if I or someone else can chip-in to find the problem out?

    Pradeep.

  23. changhao fu reporter

    For me

    I usually encode with 2 threads with --pools -,+ and --pools +,-。

    That's the fastest way for me.

    It seems that this issue could be closed.

  24. Log in to comment