Pipe input to the x265 on masOS is extremely slow
I've tried to install it from Homebrew, both 8 bit version (the default) and 10 bit version (using --with-16-bit). Also tried to build from source with linux instructions (running multilib.sh). In all cases if I use an .y4m file from file system as the input it is fast, consuming 100% CPU, i.e. spending most time on the encoding. If I pipe the input to the x265 stdin, then only ~30% CPU is consumed and the process is very slow.
Example:
$ x265 out.y4m out.ultrafast.hevc --y4m --output-depth=10 --preset=ultrafast
y4m [info]: 3840x2160 fps 24000/1001 i420p10 sar 1:1 frames 0 - 49 of 50
raw [info]: output file: out.ultrafast.hevc
x265 [info]: HEVC encoder version 2.4+2-5bc5e73760cd
x265 [info]: build info [Mac OS X][clang 8.1.0][64 bit] 10bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2
x265 [info]: Main 10 profile, Level-5 (Main tier)
x265 [info]: Thread pool created using 4 threads
x265 [info]: Slices : 1
x265 [info]: frame threads / pool features : 2 / wpp(68 rows)
x265 [info]: Coding QT: max CU size, min CU size : 32 / 16
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge : dia / 57 / 0 / 2
x265 [info]: Keyframe min / max / scenecut / bias: 23 / 250 / 0 / 5.00
x265 [info]: Lookahead / bframes / badapt : 5 / 3 / 0
x265 [info]: b-pyramid / weightp / weightb : 1 / 0 / 0
x265 [info]: References / ref-limit cu / depth : 1 / off / off
x265 [info]: AQ: mode / str / qg-size / cu-tree : 1 / 0.0 / 32 / 1
x265 [info]: Rate Control / qCompress : CRF-28.0 / 0.60
x265 [info]: tools: rd=2 psy-rd=2.00 early-skip rskip tmvp fast-intra
x265 [info]: tools: strong-intra-smoothing lslices=8 deblock
x265 [info]: frame I: 1, Avg QP:32.91 kb/s: 381.89
x265 [info]: frame P: 13, Avg QP:33.00 kb/s: 47.55
x265 [info]: frame B: 36, Avg QP:35.33 kb/s: 47.49
x265 [info]: consecutive B-frames: 14.3% 0.0% 0.0% 85.7%
encoded 50 frames in 6.87s (7.28 fps), 54.19 kb/s, Avg QP:34.68
And same with pipe input:
$ cat out.y4m | x265 - out.ultrafast.hevc --y4m --output-depth=10 --preset=ultrafast
y4m [info]: 3840x2160 fps 24000/1001 i420p10 sar 1:1 unknown frame count
raw [info]: output file: out.ultrafast.hevc
x265 [info]: HEVC encoder version 2.4+2-5bc5e73760cd
x265 [info]: build info [Mac OS X][clang 8.1.0][64 bit] 10bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2
x265 [info]: Main 10 profile, Level-5 (Main tier)
x265 [info]: Thread pool created using 4 threads
x265 [info]: Slices : 1
x265 [info]: frame threads / pool features : 2 / wpp(68 rows)
x265 [info]: Coding QT: max CU size, min CU size : 32 / 16
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge : dia / 57 / 0 / 2
x265 [info]: Keyframe min / max / scenecut / bias: 23 / 250 / 0 / 5.00
x265 [info]: Lookahead / bframes / badapt : 5 / 3 / 0
x265 [info]: b-pyramid / weightp / weightb : 1 / 0 / 0
x265 [info]: References / ref-limit cu / depth : 1 / off / off
x265 [info]: AQ: mode / str / qg-size / cu-tree : 1 / 0.0 / 32 / 1
x265 [info]: Rate Control / qCompress : CRF-28.0 / 0.60
x265 [info]: tools: rd=2 psy-rd=2.00 early-skip rskip tmvp fast-intra
x265 [info]: tools: strong-intra-smoothing lslices=8 deblock
x265 [info]: frame I: 1, Avg QP:32.91 kb/s: 381.89
x265 [info]: frame P: 13, Avg QP:33.00 kb/s: 47.55
x265 [info]: frame B: 36, Avg QP:35.33 kb/s: 47.49
x265 [info]: consecutive B-frames: 14.3% 0.0% 0.0% 85.7%
encoded 50 frames in 66.03s (0.76 fps), 54.19 kb/s, Avg QP:34.68
The --output-depth parameter doesn't seem to change things much.
I've tried to diagnose the error on my side, but it seems the x265 is the only thing that accepts pipes such slow, piping to every other process (such as 'tee' or 'cat' system utilities) seem to work very fast.
$ x265 --version
x265 [info]: HEVC encoder version 2.4+2-5bc5e73760cd
x265 [info]: build info [Mac OS X][clang 8.1.0][64 bit] 8bit+10bit+12bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2
My MacBook configuration if makes sense:
MacBook Pro (Retina, 13-inch, Late 2013)
Hardware Overview:
Model Name: MacBook Pro
Model Identifier: MacBookPro11,1
Processor Name: Intel Core i7
Processor Speed: 2,8 GHz
Number of Processors: 1
Total Number of Cores: 2
L2 Cache (per Core): 256 KB
L3 Cache: 4 MB
Memory: 16 GB
Boot ROM Version: MBP111.0138.B25
SMC Version (system): 2.16f68
System Software Overview:
System Version: macOS 10.12.4 (16E195)
Kernel Version: Darwin 16.5.0
Boot Volume: SSD
Boot Mode: Normal
User Name: Igor Malinin (igor)
Secure Virtual Memory: Enabled
System Integrity Protection: Enabled
$ xcodebuild -version
Xcode 8.3.2
Build version 8E2002
$ cc --version
Apple LLVM version 8.1.0 (clang-802.0.42)
Target: x86_64-apple-darwin16.5.0
Thread model: posix
Comments (5)
-
-
Yes, it is a lot faster with the patch. And now it seems even slightly faster with the input from file. Both file and pipe input now seem about equal in speed.
-
Thanks for confirmation.
-
I've sent this (a little modified) patch to mailing list. You can test new patch on clean source (hg update -C) by:
hg import --no-commit https://patches.videolan.org/patch/16467/raw/
-
The new patch works well too. Thanks for prompt reaction.
- Log in to comment
You can try to apply this patch www.msystem.waw.pl/x265/input.patch and report back if it is faster or not.