Segfault on OSX in planecopy_sp_shl (AVX upShift_16)
Compile in release with high bit depth and AVX on OSX. Start encoding a 3840x2160, yuv422p10 (or yuv420p10) source. x265 segfaults as it reads the 6th frame.
Crash seems due to an over-wide read at the end. Valgrind reports that the input buffer (originally allocated in input/ code) is being over-read by just 4 bytes; for the first 5 frames, the all-at-once buffer allocation pattern happens to leave this in a place which doesn't trigger a segfault, but after the first frame deallocation/reallocation, we end up such that reading even 4 bytes past the end triggers a segfault.
Manually adding 4 extra bytes to the allocated frame size in input/ (either y4m or yuv) avoids the crash, as does commenting out the appropriate planecopy_sp_shl line in copyFromPicture or falling back to SSE for the definition of planecopy_sp_shl.
Comments (6)
-
-
Account Deleted Here's an example cmdline:
ffmpeg -f lavfi -i testsrc=duration=10:size=3840x2160 -pix_fmt yuv422p10le -strict -1 -f yuv4mpegpipe - | ~/src/x265/build/linux/x265 --input - --y4m -o repro.hevc
Repro also possible with raw YUV input, so it's not a buggy allocation.
My OSX systems do have AVX2. I don't have a Linux system handy which has AVX2 to test on. I suspect that on Linux, the system allocator just happens to allocate the input buffers in such a way as to avoid this issue.
-
Thanks your report . I final found the reason. To reproduce, we must execute on CMDLINE, in the XCode IDE, it never report memory read issue.
Here is workaround, could you try it again? it was test on my MacBook Air
From d42b31346524cf635d8268cac990c31785b31d29 Mon Sep 17 00:00:00 2001 From: Min Chen <chenm003@163.com> Date: Wed, 30 Dec 2015 15:01:47 -0600 Subject: [PATCH 4/4] asm: fix crash on Mac OS X (4 bytes read over bound in upShift_16) --- source/common/x86/pixel-a.asm | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/source/common/x86/pixel-a.asm b/source/common/x86/pixel-a.asm index a810a89..fd0892d 100644 --- a/source/common/x86/pixel-a.asm +++ b/source/common/x86/pixel-a.asm @@ -8572,6 +8572,8 @@ cglobal upShift_16, 6,7,4 jz .end .process1: + test r4d, r4d + jle .end movd m1, [r0] psllw m1, m0 pand m1, m3 @@ -8685,6 +8687,8 @@ cglobal upShift_16, 6,7,4 jz .end .process1: + test r4d, r4d + jle .end movd xm1, [r0] psllw xm1, xm0 pand xm1, xm3 -- 1.7.9.msysgit.0
btw: it is fast workaround only, it is our algorithm logic bug, I will rewrite this part of code and upload today.
-
Account Deleted That workaround seems to be effective, thanks.
-
Thanks for verify, I also sent the new patch to mail-list, waitting team to verify and push into tree.
-
- changed status to resolved
- Log in to comment
Could you give us cmdline?
I was review upShift_sse2 and upShift_avx2, the algorithm made up to 2 bytes (one pixel) past bound, but width 3840 is multiple of 64, so still not enter these specially case.
Does your CPU AVX or AVX2?