Pipe input to the x265 on masOS is extremely slow

Issue #341 new
Former user created an issue

I've tried to install it from Homebrew, both 8 bit version (the default) and 10 bit version (using --with-16-bit). Also tried to build from source with linux instructions (running multilib.sh). In all cases if I use an .y4m file from file system as the input it is fast, consuming 100% CPU, i.e. spending most time on the encoding. If I pipe the input to the x265 stdin, then only ~30% CPU is consumed and the process is very slow.

Example:

$ x265 out.y4m out.ultrafast.hevc --y4m --output-depth=10 --preset=ultrafast
y4m  [info]: 3840x2160 fps 24000/1001 i420p10 sar 1:1 frames 0 - 49 of 50
raw  [info]: output file: out.ultrafast.hevc
x265 [info]: HEVC encoder version 2.4+2-5bc5e73760cd
x265 [info]: build info [Mac OS X][clang 8.1.0][64 bit] 10bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2
x265 [info]: Main 10 profile, Level-5 (Main tier)
x265 [info]: Thread pool created using 4 threads
x265 [info]: Slices                              : 1
x265 [info]: frame threads / pool features       : 2 / wpp(68 rows)
x265 [info]: Coding QT: max CU size, min CU size : 32 / 16
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge         : dia / 57 / 0 / 2
x265 [info]: Keyframe min / max / scenecut / bias: 23 / 250 / 0 / 5.00
x265 [info]: Lookahead / bframes / badapt        : 5 / 3 / 0
x265 [info]: b-pyramid / weightp / weightb       : 1 / 0 / 0
x265 [info]: References / ref-limit  cu / depth  : 1 / off / off
x265 [info]: AQ: mode / str / qg-size / cu-tree  : 1 / 0.0 / 32 / 1
x265 [info]: Rate Control / qCompress            : CRF-28.0 / 0.60
x265 [info]: tools: rd=2 psy-rd=2.00 early-skip rskip tmvp fast-intra
x265 [info]: tools: strong-intra-smoothing lslices=8 deblock
x265 [info]: frame I:      1, Avg QP:32.91  kb/s: 381.89                        
x265 [info]: frame P:     13, Avg QP:33.00  kb/s: 47.55   
x265 [info]: frame B:     36, Avg QP:35.33  kb/s: 47.49   
x265 [info]: consecutive B-frames: 14.3% 0.0% 0.0% 85.7% 

encoded 50 frames in 6.87s (7.28 fps), 54.19 kb/s, Avg QP:34.68

And same with pipe input:

$ cat out.y4m | x265 - out.ultrafast.hevc --y4m --output-depth=10 --preset=ultrafast
y4m  [info]: 3840x2160 fps 24000/1001 i420p10 sar 1:1 unknown frame count
raw  [info]: output file: out.ultrafast.hevc
x265 [info]: HEVC encoder version 2.4+2-5bc5e73760cd
x265 [info]: build info [Mac OS X][clang 8.1.0][64 bit] 10bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2
x265 [info]: Main 10 profile, Level-5 (Main tier)
x265 [info]: Thread pool created using 4 threads
x265 [info]: Slices                              : 1
x265 [info]: frame threads / pool features       : 2 / wpp(68 rows)
x265 [info]: Coding QT: max CU size, min CU size : 32 / 16
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge         : dia / 57 / 0 / 2
x265 [info]: Keyframe min / max / scenecut / bias: 23 / 250 / 0 / 5.00
x265 [info]: Lookahead / bframes / badapt        : 5 / 3 / 0
x265 [info]: b-pyramid / weightp / weightb       : 1 / 0 / 0
x265 [info]: References / ref-limit  cu / depth  : 1 / off / off
x265 [info]: AQ: mode / str / qg-size / cu-tree  : 1 / 0.0 / 32 / 1
x265 [info]: Rate Control / qCompress            : CRF-28.0 / 0.60
x265 [info]: tools: rd=2 psy-rd=2.00 early-skip rskip tmvp fast-intra
x265 [info]: tools: strong-intra-smoothing lslices=8 deblock
x265 [info]: frame I:      1, Avg QP:32.91  kb/s: 381.89                        
x265 [info]: frame P:     13, Avg QP:33.00  kb/s: 47.55   
x265 [info]: frame B:     36, Avg QP:35.33  kb/s: 47.49   
x265 [info]: consecutive B-frames: 14.3% 0.0% 0.0% 85.7% 

encoded 50 frames in 66.03s (0.76 fps), 54.19 kb/s, Avg QP:34.68

The --output-depth parameter doesn't seem to change things much.

I've tried to diagnose the error on my side, but it seems the x265 is the only thing that accepts pipes such slow, piping to every other process (such as 'tee' or 'cat' system utilities) seem to work very fast.

$ x265 --version
x265 [info]: HEVC encoder version 2.4+2-5bc5e73760cd
x265 [info]: build info [Mac OS X][clang 8.1.0][64 bit] 8bit+10bit+12bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2

My MacBook configuration if makes sense:

 MacBook Pro (Retina, 13-inch, Late 2013)

Hardware Overview:

  Model Name:   MacBook Pro
  Model Identifier: MacBookPro11,1
  Processor Name:   Intel Core i7
  Processor Speed:  2,8 GHz
  Number of Processors: 1
  Total Number of Cores:    2
  L2 Cache (per Core):  256 KB
  L3 Cache: 4 MB
  Memory:   16 GB
  Boot ROM Version: MBP111.0138.B25
  SMC Version (system): 2.16f68

System Software Overview:

  System Version:   macOS 10.12.4 (16E195)
  Kernel Version:   Darwin 16.5.0
  Boot Volume:  SSD
  Boot Mode:    Normal
  User Name:    Igor Malinin (igor)
  Secure Virtual Memory:    Enabled
  System Integrity Protection:  Enabled

$ xcodebuild -version
Xcode 8.3.2
Build version 8E2002

$ cc --version
Apple LLVM version 8.1.0 (clang-802.0.42)
Target: x86_64-apple-darwin16.5.0
Thread model: posix

Comments (5)

  1. Igor Malinin

    Yes, it is a lot faster with the patch. And now it seems even slightly faster with the input from file. Both file and pipe input now seem about equal in speed.

  2. Ma0

    I've sent this (a little modified) patch to mailing list. You can test new patch on clean source (hg update -C) by:

    hg import --no-commit https://patches.videolan.org/patch/16467/raw/
    
  3. Log in to comment