12-bit encoding broken in ver. 1.7+470

Issue #180 resolved
Ma0 created an issue

In ver. 1.7+470 8 and 10-bit encoding is working, 12-bit encoding is broken. Unwatchable output, big difference when --no-asm option is added (in both cases output unwatchable).

Comments (27)

  1. M CHEN

    Thank your report, I can reproduce it in preset slow and above. I was fixed the bug, the root cause is PSYVALUE()

  2. Ma0 reporter

    After close look it is quite strange: VS 2015 build with LTO (option -GL) is totally broken @ 12bit. GCC 5.2 build & VS 2015 build without LTO is simply bad @ 12bit (VS 2015 build without LTO has bit identical output to GCC 5.2 build). In this message http://forum.doom9.org/showthread.php?p=1738104#post1738104 there is example of 12bit encoding with preset medium with GCC 5.2 & VS 2015 LTO.

  3. Ma0 reporter

    After additional tests: this problem is not related to ver 1.7+470 (it was earlier); VS 2015 LTO multilib build is totally broken @ 12bit, VS 2015 LTO normal 12bit build is OK (like GCC build); 12bit --no-asm output differs from --asm SSE2 output; --asm SSE2, --asm SSSE3, --asm SSE4.2, --asm AVX outputs are identical.

  4. Ma0 reporter

    I've checked the last 2 patches -- quality glitches are gone. Thanks! Still --no-asm output differs from normal (maybe I should apply another patch).

  5. Ma0 reporter

    I've applied additional your third patch fix PSYVALUE shift overflow, Issue #180 [OUTPUT CHANGE on 12bpp] and ramya patch asm: fix sse_pp[32x64] sse2 asm for 12 bit and still differs:

    i:\speed\12b>x265 -D12 --no-asm 720p50_parkrun_ter.y4m w3-n.hevc
    y4m  [info]: 1280x720 fps 50/1 i420p8 sar 1:1 frames 0 - 503 of 504
    raw  [info]: output file: w3-n.hevc
    x265 [info]: HEVC encoder version 1.7+478-365f7ed4d896
    x265 [info]: build info [Windows][MSVC 1900][64 bit] 12bit
    x265 [info]: using cpu capabilities: none!
    x265 [info]: Main 12 profile, Level-4 (Main tier)
    x265 [info]: Thread pool created using 4 threads
    x265 [info]: frame threads / pool features       : 2 / wpp(12 rows)
    x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
    x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
    x265 [info]: ME / range / subpel / merge         : hex / 57 / 2 / 2
    x265 [info]: Keyframe min / max / scenecut       : 25 / 250 / 40
    x265 [info]: Lookahead / bframes / badapt        : 20 / 4 / 2
    x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 0
    x265 [info]: References / ref-limit  cu / depth  : 3 / 0 / 0
    x265 [info]: AQ: mode / str / qg-size / cu-tree  : 1 / 1.0 / 32 / 1
    x265 [info]: Rate Control / qCompress            : CRF-28.0 / 0.60
    x265 [info]: tools: rd=3 psy-rd=0.30 signhide tmvp strong-intra-smoothing
    x265 [info]: tools: deblock sao
    x265 [info]: frame I:      3, Avg QP:30.45  kb/s: 13883.07
    x265 [info]: frame P:    123, Avg QP:35.22  kb/s: 11264.51
    x265 [info]: frame B:    378, Avg QP:38.97  kb/s: 452.37
    x265 [info]: Weighted P-Frames: Y:0.8% UV:0.8%
    x265 [info]: consecutive B-frames: 2.4% 2.4% 9.5% 64.3% 21.4%
    
    encoded 504 frames in 128.44s (3.92 fps), 3171.00 kb/s, Avg QP:38.00
    
    i:\speed\12b>x265 -D12 720p50_parkrun_ter.y4m w3.hevc
    y4m  [info]: 1280x720 fps 50/1 i420p8 sar 1:1 frames 0 - 503 of 504
    raw  [info]: output file: w3.hevc
    x265 [info]: HEVC encoder version 1.7+478-365f7ed4d896
    x265 [info]: build info [Windows][MSVC 1900][64 bit] 12bit
    x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX
    x265 [info]: Main 12 profile, Level-4 (Main tier)
    x265 [info]: Thread pool created using 4 threads
    x265 [info]: frame threads / pool features       : 2 / wpp(12 rows)
    x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
    x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
    x265 [info]: ME / range / subpel / merge         : hex / 57 / 2 / 2
    x265 [info]: Keyframe min / max / scenecut       : 25 / 250 / 40
    x265 [info]: Lookahead / bframes / badapt        : 20 / 4 / 2
    x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 0
    x265 [info]: References / ref-limit  cu / depth  : 3 / 0 / 0
    x265 [info]: AQ: mode / str / qg-size / cu-tree  : 1 / 1.0 / 32 / 1
    x265 [info]: Rate Control / qCompress            : CRF-28.0 / 0.60
    x265 [info]: tools: rd=3 psy-rd=0.30 signhide tmvp strong-intra-smoothing
    x265 [info]: tools: deblock sao
    x265 [info]: frame I:      3, Avg QP:30.45  kb/s: 13882.93
    x265 [info]: frame P:    123, Avg QP:35.22  kb/s: 11265.20
    x265 [info]: frame B:    378, Avg QP:38.97  kb/s: 449.16
    x265 [info]: Weighted P-Frames: Y:0.8% UV:0.8%
    x265 [info]: consecutive B-frames: 2.4% 2.4% 9.5% 64.3% 21.4%
    
    encoded 504 frames in 50.34s (10.01 fps), 3168.75 kb/s, Avg QP:38.00
    
  6. M CHEN

    In late frame, more satd functions need fix Workaround is disable all of SATD asm code I am working on fix these functions one by one, need more time

  7. Ma0 reporter

    Now is much better. Thanks! However at preset slower (I could apply wrong patches, I will confirm later):

    i:\speed\12b>x265 -D12 --preset slower 720p50_parkrun_ter.y4m w.hevc
    y4m  [info]: 1280x720 fps 50/1 i420p8 sar 1:1 frames 0 - 503 of 504
    raw  [info]: output file: w.hevc
    x265 [info]: HEVC encoder version 1.7+478-365f7ed4d896
    x265 [info]: build info [Windows][GCC 5.2.0][64 bit] 12bit
    x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX
    x265 [info]: Main 12 profile, Level-4 (Main tier)
    x265 [info]: Thread pool created using 4 threads
    x265 [info]: frame threads / pool features       : 2 / wpp(12 rows)
    x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
    x265 [info]: Residual QT: max TU size, max depth : 32 / 2 inter / 2 intra
    x265 [info]: ME / range / subpel / merge         : star / 57 / 3 / 3
    x265 [info]: Keyframe min / max / scenecut       : 25 / 250 / 40
    x265 [info]: Lookahead / bframes / badapt        : 30 / 8 / 2
    x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 1
    x265 [info]: References / ref-limit  cu / depth  : 3 / 0 / 0
    x265 [info]: AQ: mode / str / qg-size / cu-tree  : 1 / 1.0 / 32 / 1
    x265 [info]: Rate Control / qCompress            : CRF-28.0 / 0.60
    x265 [info]: tools: rect amp rd=6 psy-rd=0.30 rdoq=2 psy-rdoq=1.00 signhide
    x265 [info]: tools: tmvp b-intra strong-intra-smoothing deblock sao
    x265 [info]: frame I:      3, Avg QP:31.06  kb/s: 12825.07
    x265 [info]: frame P:    103, Avg QP:35.15  kb/s: 13088.21
    x265 [info]: frame B:    398, Avg QP:38.28  kb/s: 519.31
    x265 [info]: Weighted P-Frames: Y:1.0% UV:1.0%
    x265 [info]: Weighted B-Frames: Y:0.0% UV:0.0%
    x265 [info]: consecutive B-frames: 2.8% 0.0% 3.8% 50.0% 18.9% 12.3% 7.5% 1.9% 2.8%
    
    encoded 504 frames in 245.25s (2.06 fps), 3161.20 kb/s, Avg QP:37.60
    
    i:\speed\12b>x265 -D12 --preset slower --no-asm 720p50_parkrun_ter.y4m wn.hevc
    y4m  [info]: 1280x720 fps 50/1 i420p8 sar 1:1 frames 0 - 503 of 504
    raw  [info]: output file: wn.hevc
    x265 [info]: HEVC encoder version 1.7+478-365f7ed4d896
    x265 [info]: build info [Windows][GCC 5.2.0][64 bit] 12bit
    x265 [info]: using cpu capabilities: none!
    x265 [info]: Main 12 profile, Level-4 (Main tier)
    x265 [info]: Thread pool created using 4 threads
    x265 [info]: frame threads / pool features       : 2 / wpp(12 rows)
    x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
    x265 [info]: Residual QT: max TU size, max depth : 32 / 2 inter / 2 intra
    x265 [info]: ME / range / subpel / merge         : star / 57 / 3 / 3
    x265 [info]: Keyframe min / max / scenecut       : 25 / 250 / 40
    x265 [info]: Lookahead / bframes / badapt        : 30 / 8 / 2
    x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 1
    x265 [info]: References / ref-limit  cu / depth  : 3 / 0 / 0
    x265 [info]: AQ: mode / str / qg-size / cu-tree  : 1 / 1.0 / 32 / 1
    x265 [info]: Rate Control / qCompress            : CRF-28.0 / 0.60
    x265 [info]: tools: rect amp rd=6 psy-rd=0.30 rdoq=2 psy-rdoq=1.00 signhide
    x265 [info]: tools: tmvp b-intra strong-intra-smoothing deblock sao
    x265 [info]: frame I:      3, Avg QP:31.06  kb/s: 12825.07
    x265 [info]: frame P:    103, Avg QP:35.16  kb/s: 13068.81
    x265 [info]: frame B:    398, Avg QP:38.24  kb/s: 516.09
    x265 [info]: Weighted P-Frames: Y:1.0% UV:1.0%
    x265 [info]: Weighted B-Frames: Y:0.0% UV:0.0%
    x265 [info]: consecutive B-frames: 2.8% 0.0% 3.8% 50.0% 18.9% 12.3% 7.5% 1.9% 2.8%
    
    encoded 504 frames in 526.53s (0.96 fps), 3154.69 kb/s, Avg QP:37.57
    

    I applied wrong patches, but patch [x265] [PATCH 1 of 2] fix SSE_PP intermedia result overflow in Main12, (fixes #180) is the same as older [x265] [PATCH 2 of 2] fix SSE_PP intermedia result overflow in Main12, issue #180. So it should be problem @ preset slower.

  8. Ma0 reporter

    I found quality glitches at preset veryslow:

    i:\speed\12b>x265vs -D12 --preset veryslow --no-asm 720p50_parkrun_ter.y4m vsn.hevc
    y4m  [info]: 1280x720 fps 50/1 i420p8 sar 1:1 frames 0 - 503 of 504
    raw  [info]: output file: vsn.hevc
    x265 [info]: HEVC encoder version 1.7+478-365f7ed4d896
    x265 [info]: build info [Windows][MSVC 1900][64 bit] 12bit
    x265 [info]: using cpu capabilities: none!
    x265 [info]: Main 12 profile, Level-4 (Main tier)
    x265 [info]: Thread pool created using 4 threads
    x265 [info]: frame threads / pool features       : 2 / wpp(12 rows)
    x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
    x265 [info]: Residual QT: max TU size, max depth : 32 / 3 inter / 3 intra
    x265 [info]: ME / range / subpel / merge         : star / 57 / 4 / 4
    x265 [info]: Keyframe min / max / scenecut       : 25 / 250 / 40
    x265 [info]: Lookahead / bframes / badapt        : 40 / 8 / 2
    x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 1
    x265 [info]: References / ref-limit  cu / depth  : 5 / 0 / 0
    x265 [info]: AQ: mode / str / qg-size / cu-tree  : 1 / 1.0 / 32 / 1
    x265 [info]: Rate Control / qCompress            : CRF-28.0 / 0.60
    x265 [info]: tools: rect amp rd=6 psy-rd=0.30 rdoq=2 psy-rdoq=1.00 signhide
    x265 [info]: tools: tmvp b-intra strong-intra-smoothing deblock sao
    x265 [info]: frame I:      3, Avg QP:30.70  kb/s: 12827.73
    x265 [info]: frame P:    104, Avg QP:35.15  kb/s: 12856.90
    x265 [info]: frame B:    397, Avg QP:38.20  kb/s: 491.08
    x265 [info]: Weighted P-Frames: Y:1.0% UV:1.0%
    x265 [info]: Weighted B-Frames: Y:0.0% UV:0.0%
    x265 [info]: consecutive B-frames: 2.8% 0.0% 3.7% 52.3% 17.8% 11.2% 8.4% 0.9% 2.8%
    
    encoded 504 frames in 1047.48s (0.48 fps), 3116.19 kb/s, Avg QP:37.52
    

    snapshot_1.jpg snapshot_2.jpg

  9. M CHEN

    Thanks your report. the asm output mistake because there have un-commit code in my local tree, I will merge it into patch the quality issue is same as before intermedia result overflow, I was fixed these bugs, I am doing verify and send patches after confirm.

  10. Ma0 reporter

    Now quality is good, output the same with --no-asm option. Thanks! The oldest patch fix PSYVALUE shift overflow, Issue #180 [OUTPUT CHANGE on 12bpp] when I apply to ver. 1.7+478 display info:

    patching file source/common/quant.cpp
    Hunk #1 succeeded at 585 with fuzz 2 (offset -3 lines).
    

    It works but I'm curious why there is 3 line difference.

  11. M CHEN

    version tree control by Deepthi, I guess she in the business travel.

    If the bug was solved, we can close this issue.

  12. Log in to comment