HIGH_BIT_DEPTH apparently reduces quality per bitrate

Issue #55 resolved
John Boyle created an issue

I've found that, for a given video, a given set of parameters to ffmpeg, and a given version of libx265--holding everything constant except for whether x265 is built with HIGH_BIT_DEPTH enabled--the higher bit depth significantly reduces coding efficiency.

This manifests itself in either of two ways: If quality is controlled with a constant rate factor (e.g. CRF-22.0), then the output is perhaps 1.7x as large, and if it is an average bit rate (e.g. ABR-150 kbps), then the output has the same size but is significantly lower quality (the reported global QP was increased by 10 in one case, and visually it looks worse). And in either case the job takes twice as long or more--though I expected that part.

ffmpeg reports that the output files have what might be different bit depths: "hevc (Main), yuv420p(tv)" from 8bpp versus "hevc (Main 10), yuv420p10le(tv)" from 16bpp. Is that the reason for the size increase--a 10-bit output file? Is that supposed to happen? (The input file says "h264 (High), yuv420p", and I don't think it has a bit depth of 10. The terminology of the source code and debugging output, which talks about "Internal bit depth", leads me to expect the output should by default have the same bit depth as the input.) Is there a way to turn off the 10-bit output, and just get the benefits of more precise internal computations? (There is a --recon-depth argument to the x265 executable, but it seems inapplicable to this.)

I am aware of some source comments and issue comments indicating that supporting both 8-bit and 10-bit internal bit depths is a desired-but-not-yet-implemented feature. But this seems a bit different.

Comments (5)

  1. Former user Account Deleted

    then the output has the same size but is significantly lower quality (the reported global QP was increased by 10 in one case, and visually it looks worse)

    Yep. I have same observations. x265 10-bit encodes have significntly lower SSIM and PSNR for the same bitrate. As for visual quality i have mixed feelings - banding is gone, but details slightly poor that 8-bit. I suppose its some hidden bug.

    Is that the reason for the size increase--a 10-bit output file? Is that supposed to happen?

    No. 10-bit provides high precision calculations and should improve compression like x264 does.

    global QP was increased by 10 in one case

    Because QP range extends when raising bitdepth. [0-51] for 8 bit vs [0-63] for 10 bit. So qp and hence crf for high bit depth must be slighly increased for same bitrate.

  2. Ben Waggoner

    I note that x264 doesn't appear to need a higher CRF in 10-bit mode than 8-bit mode, which is very useful. I believe that was implemented as a fix at some point, and originally, it also needed a higher CRF to get the same bitrate and 10-bit.

    There is a bunch of documentation of this in this thread: http://forum.doom9.org/showthread.php?t=170236

    Testing determined that CRF values give quite different results even when encoding 8-bit HEVC using the 16-bit encoder versus encoding 8-bit HEVC with the 8-bit encoder.

  3. Steve Borho

    HIGH_BIT_DEPTH builds always use an internal bit depth of 10bit pixels (16bits per pixel), and the internal bit depth is the same thing as the output bit depth. There is no way to use an internal bit depth of 10 and an output bit depth of 8, they are directly coupled.

    Recon depth, as you guessed, is unrelated; just a debugging feature allowing you to output an 8bit y4m of the recon images from a 10bit encode, making the recon stream easy to view in VLC.

    We have not yet tried to tie the rate-factors of 8 and 10bit encodes together.

    I don't know why the quality per bit is lower at this time. If I had to guess I would first guess bugs in perhaps AQ or cuTree related to 10bit costs. Second I would guess that HEVC benefits less from 10bit encodes than AVC because HEVC has decent subpel filters and thus does not lose so much detail in 8bit encodes. Thus there is less of a gain from 10bit encodes to outweigh the bit cost from the increased sample size (that is an unproven theory, just speculation).

  4. Cody Opel

    It makes sense that if encoding at a fixed bitrate in both 8-bit and 10-bit, given 8-bit uses 12 bits per pixel, whereas 10-bit uses 16 bits per pixel, it would require a larger file for the 10-bit encode to achieve perceived transparency with the 8-bit. Also unless you are encoding from a 10-bit source the SSIM and PSNR values will be further skewed due to the fact that you are using colors that did not exist in the source content. 8-bit has 256 colors and 10-bit has 1024. In the end it comes down to this is not a problem with the encoder, but the usage described. As far as crf from 8-bit to 10-bit, it would require a skew in the bitrates to compensate for the variation.

    On an unrelated note:

    This is an exaggerated example that shows the benefit of higher bit-depths. I have also noticed quality issues related to using cutrees and adaptive quantization shown below, but the outcome would be different at higher bitrates.

    All @~3500kbps bitrate (issues become more apparent at lower bitrates)

    With Adaptive Quantization and CU Trees

    No AQ or CU Trees:

    At first glance the 8-bit-no-aq version looks the best, but looking closer the 10-bit-no-aq version preserves the most detail (especially around the edges of the letters), also considering they were encoded at identical bitrates, not compensating for 10-bit. There is always a gain a far as quality is concerned when using 10-bit's additional color range (even if you using an 8-bit display, which most are), but when optimizing for filesize it is a whole different story.

  5. Steve Borho

    constants: adjust lambda tabels for 10bit encodes (fixes #55)

    Since samples are 10bits, where two bits of extra resolution has been added to add more granularity, distortion also has two extra bits. A typical resolution for this problem is to down-shift distortion by 2 bits everywhere, before adding lambda * bits to calculate RD cost. Instead, we multiply lambda by 4 (essentially shift it up by two bits) so distortion and lambda * bits are both at the higher scale.

    lambda2 uses the square of the up-shifted lambda, so it has the doubled up-shift same as the squared distortion values used for RDO.

    Example output change: ./x265 /Volumes/video/sintel_trailer_2k_480p24.y4m o.bin --ssim --no-info

    Main: 195.67 kb/s, SSIM Mean Y: 0.9833338 (17.782 dB) Main10 before: 363.49 kb/s, SSIM Mean Y: 0.9888182 (19.515 dB) Main10 after: 206.54 kb/s, SSIM Mean Y: 0.9855121 (18.390 dB)

    → <<cset 014a1e0fb58b>>

  6. Log in to comment