Gcc -mcpu / -mtune for "Carmel" CPU?

Dear Experts,

Does anyone know what the appropriate setting to pass to gcc’s -mcpu or -mtune optioon is for the Carmel CPUs in the NX?

There doesn’t seem to be specific support for this CPU in gcc. From what I know about the microarchitecture it’s a rather unusual design and it’s not obvious which of the Coretex options it is most similar to.

Thanks, Phil.

I think you can using the system tools taskset to have your process assign to the cpu cores which you want.

That’s not what gcc -mcpu and -mtune do.

https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html

The Carmel cores emulates the ARM Architecture version 8.2, executing both 64-bit AArch64 code, and 32-bit AArch32 code.

Further inputs :
The most important things are the -march flags corresponding to the processor capabilities (something like -march=armv8.2-a+fp16+simd+crypto+predres), turning on appropriate levels of optimization (-O2 in general, -O3 for very hot loops), using -ffast-math where possible, and having the latest Jetpack release. We also see better performance with newer compiler revisions and recompiling some base libraries to better use the available processor features (especially v8.1 LSE atomics).

In terms of scheduling (-mtune), cortex-a75 and cortex-a76 should both be good starting points, as should -mtune=generic-armv8.2-a. The processor dynamic code optimization can compensate for some shortcomings in the scheduling and selection of instructions, so we believe this should be secondary or tertiary in most cases.

If you have questions about the performance of code sequences that you can share, we would be happy to provide some further help or analysis.

1 Like

Thanks.

@Bibek @nvidiadev1 hi,

I’m getting an error compiling openfst with -mcpu=cortex-a76:

cc1plus: error: unknown value 'cortex-a76' for -mcpu
cc1plus: note: valid arguments are: cortex-a35 cortex-a53 cortex-a57 cortex-a72 cortex-a73 thunderx thunderxt88p1 thunderxt88 thunderxt81 thunderxt83 xgene1 falkor qdf24xx exynos-m1 thunderx2t99p1 vulcan thunderx2t99 cortex-a57.cortex-a53 cortex-a72.cortex-a53 cortex-a73.cortex-a35 cortex-a73.cortex-a53 generic; did you mean 'cortex-a72'?

So which option should be used there?

P.S. From the above post I can use only -march=armv8.2-a. Other options don’t work.

UPDATE

Seems like the official image has gcc 7.5 installed, which knows nothing about such cpu. I’ve upgraded gcc to 10th version and was able to compile the required libraries.

sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt update
sudo apt upgrade
sudo apt-get install gcc-10 g++-10
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-10 10 --slave /usr/bin/g++ g++ /usr/bin/g++-10 --slave /usr/bin/gcov gcov /usr/bin/gcov-10
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-7 7 --slave /usr/bin/g++ g++ /usr/bin/g++-7 --slave /usr/bin/gcov gcov /usr/bin/gcov-7
sudo update-alternatives --config gcc

UPDATE 2

Seems like cuda10.2 doesn’t work with gcc 9+. So if you need cuda support in your libs, it’s required to install and configure gcc-8 the same way it was done in the first update.

‘-march=armv8.2-a’ parameter should be enough. ‘-mtune’ is not mandatory parameter.