multicoreware / x265_git / issues / #510 - aarch64 does't support neon — Bitbucket

Issue #510 new

minmin created an issue 2019-08-25

when I compile source code on the aarch64 centos-7.6 platform, the performance is 20_30% compare with x86 platform。

how can I do if i need support neon on aarch64 centos-7.6 platform?

oh,I use x265_3.1.1.tar.gz version

Comments (7)

M CHEN
The ARM64 NEON ISA is different to ARM32, so our NEON asm can’t be execute directly in ARM64 platforms, there have two workaround,

one is build as ARM32 lib & execute binary, the ARM64 is compatible with it,

the second is rewrite these asm code by Intrinsic, it is compatible in both ARM32 and ARM64.
- 2019-08-26T01:45:28+00:00
minmin reporter
hi, I compile the x265 arm32 with neon， the following is size of x265 excutable file and performance:

master branch,

arm32 excutable size is 124K, and aarch64 excutable size is 151K.

arm32 excutable file ‘s vedio decompress time is 68.3% of aarch64 excutable file’s vedio decompress time。

the performance is much worse than X86。

so I have to rewrite asm code by Intrinsic.
- 2019-08-29T08:00:11+00:00
minmin reporter
hi,I feel puzzled of follow two functions:

first:

.macro blockcopy_pp_16xN_neon h
function x265_blockcopy_pp_16x\h()_neon
.rept \h
vld1.8 {q0}, [r2], r3
vst1.8 {q0}, [r0], r1
.endr
bx lr
endfunc
.endm

blockcopy_pp_16xN_neon 4
blockcopy_pp_16xN_neon 8
blockcopy_pp_16xN_neon 12
blockcopy_pp_16xN_neon 24

second:

function x265_blockcopy_pp_16x16_neon

.rept 16

vld1.8 {q0}, [r2]

vst1.8 {q0}, [r0]

add r2, r2, r3

add r0, r0, r1

.endr

bx lr

endfunc

‌

the second function is a special case of first function, why use two function?

others of like functions pairs , function x265_blockcopy_pp_32x8_neon , function x265_blockcopy_pp_32x\h\()_neon.

‌
- 2019-09-16T10:11:10+00:00
M CHEN
Thank you report, it is my mistake, the function assign to different engineer and I don’t find duplicated code on time.
- 2019-09-16T11:10:36+00:00
minmin reporter
oh, I got it. I rewrite x265 asm code by c function in arm_neon.h file. I tried to read the arm32 neon code to refer to the algorithm, I thought arm32 code was implemented for performance reasons.

Thanks for your reply！

‌
- 2019-09-17T01:52:08+00:00
Marcus
Hello! Have you sovled the problem of “aarch64 does't support neon”? My paltform is centos, aarch64, and I also use x265_3.1.1.tar.gz version. If you have solution, can you tell me?
- 2019-10-11T02:37:00+00:00
M CHEN
That’s different ISA, the workaround is rewrite these function by Intrinsic instructions.

‌
- 2019-10-11T03:21:42+00:00
Log in to comment

Assignee: –

Type: enhancement

Priority: major

Status: new

Component: –

Version: –

Votes: 1

Watchers: 4