aarch64 does't support neon

Issue #510 new
minmin created an issue

when I compile source code on the aarch64 centos-7.6 platform, the performance is 20_30% compare with x86 platform。

how can I do if i need support neon on aarch64 centos-7.6 platform?

oh,I use x265_3.1.1.tar.gz version

Comments (7)

  1. M CHEN

    The ARM64 NEON ISA is different to ARM32, so our NEON asm can’t be execute directly in ARM64 platforms, there have two workaround,

    one is build as ARM32 lib & execute binary, the ARM64 is compatible with it,

    the second is rewrite these asm code by Intrinsic, it is compatible in both ARM32 and ARM64.

  2. minmin reporter

    hi, I compile the x265 arm32 with neon, the following is size of x265 excutable file and performance:

    master branch,

    arm32 excutable size is 124K, and aarch64 excutable size is 151K.

    arm32 excutable file ‘s vedio decompress time is 68.3% of aarch64 excutable file’s vedio decompress time。

    the performance is much worse than X86。

    so I have to rewrite asm code by Intrinsic.

  3. minmin reporter

    hi,I feel puzzled of follow two functions:

    first:

    .macro blockcopy_pp_16xN_neon h
    function x265_blockcopy_pp_16x\h()_neon
    .rept \h
    vld1.8 {q0}, [r2], r3
    vst1.8 {q0}, [r0], r1
    .endr
    bx lr
    endfunc
    .endm

    blockcopy_pp_16xN_neon 4
    blockcopy_pp_16xN_neon 8
    blockcopy_pp_16xN_neon 12
    blockcopy_pp_16xN_neon 24

    second:

    function x265_blockcopy_pp_16x16_neon

    .rept 16

    vld1.8 {q0}, [r2]

    vst1.8 {q0}, [r0]

    add r2, r2, r3

    add r0, r0, r1

    .endr

    bx lr

    endfunc

    the second function is a special case of first function, why use two function?

    others of like functions pairs , function x265_blockcopy_pp_32x8_neon , function x265_blockcopy_pp_32x\h\()_neon.

  4. M CHEN

    Thank you report, it is my mistake, the function assign to different engineer and I don’t find duplicated code on time.

  5. minmin reporter

    oh, I got it. I rewrite x265 asm code by c function in arm_neon.h file. I tried to read the arm32 neon code to refer to the algorithm, I thought arm32 code was implemented for performance reasons.

    Thanks for your reply!

  6. Marcus

    Hello! Have you sovled the problem of “aarch64 does't support neon”? My paltform is centos, aarch64, and I also use x265_3.1.1.tar.gz version. If you have solution, can you tell me?

  7. Log in to comment