aarch64 does't support neon
when I compile source code on the aarch64 centos-7.6 platform, the performance is 20_30% compare with x86 platform。
how can I do if i need support neon on aarch64 centos-7.6 platform?
oh,I use x265_3.1.1.tar.gz version
Comments (7)
-
-
reporter hi, I compile the x265 arm32 with neon, the following is size of x265 excutable file and performance:
master branch,
arm32 excutable size is 124K, and aarch64 excutable size is 151K.
arm32 excutable file ‘s vedio decompress time is 68.3% of aarch64 excutable file’s vedio decompress time。
the performance is much worse than X86。
so I have to rewrite asm code by Intrinsic.
-
reporter hi,I feel puzzled of follow two functions:
first:
.macro blockcopy_pp_16xN_neon h
function x265_blockcopy_pp_16x\h()_neon
.rept \h
vld1.8 {q0}, [r2], r3
vst1.8 {q0}, [r0], r1
.endr
bx lr
endfunc
.endm
blockcopy_pp_16xN_neon 4
blockcopy_pp_16xN_neon 8
blockcopy_pp_16xN_neon 12
blockcopy_pp_16xN_neon 24
second:
function x265_blockcopy_pp_16x16_neon
.rept 16
vld1.8 {q0}, [r2]
vst1.8 {q0}, [r0]
add r2, r2, r3
add r0, r0, r1
.endr
bx lr
endfunc
the second function is a special case of first function, why use two function?
others of like functions pairs , function x265_blockcopy_pp_32x8_neon , function x265_blockcopy_pp_32x\h\()_neon.
-
Thank you report, it is my mistake, the function assign to different engineer and I don’t find duplicated code on time.
-
reporter oh, I got it. I rewrite x265 asm code by c function in arm_neon.h file. I tried to read the arm32 neon code to refer to the algorithm, I thought arm32 code was implemented for performance reasons.
Thanks for your reply!
-
Hello! Have you sovled the problem of “aarch64 does't support neon”? My paltform is centos, aarch64, and I also use x265_3.1.1.tar.gz version. If you have solution, can you tell me?
-
That’s different ISA, the workaround is rewrite these function by Intrinsic instructions.
- Log in to comment
The ARM64 NEON ISA is different to ARM32, so our NEON asm can’t be execute directly in ARM64 platforms, there have two workaround,
one is build as ARM32 lib & execute binary, the ARM64 is compatible with it,
the second is rewrite these asm code by Intrinsic, it is compatible in both ARM32 and ARM64.