Compilation for Xavier arm64

yotam.nachmias · August 9, 2020, 6:01am

Hardware Platform: [DRIVE AGX Pegasus™ Developer Kit]
Software Version:[DRIVE Software 10]
Host Machine Version:[native Ubuntu 18.04]
SDK Manager Version: [1.0.2.6738]

Hi,

I was wondering if there are any optimization flags I can use with the “aarch64-linux-gnu-g++” to fully optimize my application performance for the specific ARM CPU (Carmel) In the Xavier.

For example: is there something like “-march=armv8-a” or “-mcpu=cortex-a57” that needed to be used with the Carmel architecture? also, do I need to specify any “use Neon” flag?

raul.tambre · August 9, 2020, 9:34am

Not relevant for G++, but starting with Clang 11 you can pass the flag -mcpu=carmel. I contributed that.

Phoronix article

SivaRamaKrishnaNV · August 10, 2020, 11:23am

Dear @yotam.nachmias,
Did you check using -O3 to see if vectorized instructions appearing in in assembly to confirm? Make sure that the code need to be written in a way to have auto vectorization.

yotam.nachmias · August 16, 2020, 7:00am

Hi,
there are no compilation issues using the -O3, no difference in performance however.

SivaRamaKrishnaNV · August 17, 2020, 4:20am

Dear @yotam.nachmias,
O3 should enable autovectorization to use neon intrinsics. You can check assembly code to verify if vector instructions are presents.
BTW, is it not possible to push your computation on GPU to speed up?

yotam.nachmias · August 17, 2020, 9:09am

Hi SivaRamaKrishnaNV,

I do see vector instruction present using the -O3 flag. In my case it does not improve performance but it is good to know that.
of course I’m using the GPU as much as possible.