Compilation for Xavier arm64

Hardware Platform: [DRIVE AGX Pegasus™ Developer Kit]
Software Version:[DRIVE Software 10]
Host Machine Version:[native Ubuntu 18.04]
SDK Manager Version: [1.0.2.6738]

Hi,

I was wondering if there are any optimization flags I can use with the “aarch64-linux-gnu-g++” to fully optimize my application performance for the specific ARM CPU (Carmel) In the Xavier.

For example: is there something like “-march=armv8-a” or “-mcpu=cortex-a57” that needed to be used with the Carmel architecture? also, do I need to specify any “use Neon” flag?

Not relevant for G++, but starting with Clang 11 you can pass the flag -mcpu=carmel. I contributed that.

Phoronix article

1 Like

Dear @yotam.nachmias,
Did you check using -O3 to see if vectorized instructions appearing in in assembly to confirm? Make sure that the code need to be written in a way to have auto vectorization.

Hi,
there are no compilation issues using the -O3, no difference in performance however.

Dear @yotam.nachmias,
O3 should enable autovectorization to use neon intrinsics. You can check assembly code to verify if vector instructions are presents.
BTW, is it not possible to push your computation on GPU to speed up?

Hi SivaRamaKrishnaNV,

I do see vector instruction present using the -O3 flag. In my case it does not improve performance but it is good to know that.
of course I’m using the GPU as much as possible.