Gcc options for TX2

Is there a best combination of GCC options for TX2?

Usually, with correct machine dependent options, it is possible to get better performance on CPU. For example, these are usually specified for building code against TK1

armhf / c++11 support / neon-vfpv4 / cortex-a15

On the other hand, what should be given to gcc for optimization on TX2? Especially, for Denver cores, what is the best combination?

Hi,

For A57, here are the suggestions:

    [*]Use latest GCC toolchain 7.2 [*]Use CLANG llvm front end an alternative to GCC [*]-march=armv8.a+crypto+simd, this enables SIMD, crypto and floating point instruction set and may help.

For Denver, we are checking internally. Will update information with you later.

Thanks and Happy New Year! : )

Here is our update:

1. For AArch64 mode: -O3 -ffast-math -flto -march=armv8-a+crypto -mcpu=cortex-a57+crypto
2. Other options like -funroll-loops and -fvect-cost-model=unlimited may have a benefit, but it’ll be application dependent
3. Some of the AArch32 options such as -mfloat-abi=hard might not be applicable, depending on the system libraries available.

Thanks.

To my knowledge there isn’t a way to specify which core an application is running on (except for disabling the Denver cores which isn’t desirable). So assuming all six cores are running, do the suggestions in comment #2:

still apply? or only those in comment #3?

Comment #3 should be enough.

On R28.2-DP, using -flto gives these messages:

/usr/bin/ar: CMakeFiles/cuda_compile.dir/kernel/scan_by_key/cuda_compile_generated_scan_by_key_impl_af_add_t.cu.o: <b>plugin needed to handle lto object</b>
/usr/bin/ranlib: cuda_compile_generated_scan_by_key_impl_af_add_t.cu.o: <b>plugin needed to handle lto object</b>

Is a custom version of binutils required enabling plugins ?

Check which version of cmake you are using. Link time optimization apparently wasn’t supported until a very recent version (3.9) of cmake. See https://stackoverflow.com/questions/31355692/cmake-support-for-gccs-link-time-optimization-lto

Yes, seems related to my cmakefiles… Not sure these how flags should be passed for a .cu file.
I can use -flto for normal .cpp compilation.

Hi, Honey_Patouceul

Checking this internally.
Will update information with you later.

Hi,

You can try appending -Xcompiler option to nvcc:

nvcc test.cu -o test -Xcompiler "-O3 -ffast-math -flto -march=armv8-a+crypto -mcpu=cortex-a57+crypto"

Thanks.

Dear ,for Denver, you are checking internally. Do you have information or advice to update your answer(I mean the optimal compile options)?
Tks and good luck!

Hi,

This command is for both A57 and Denver:

  • O3 -ffast-math
  • -flto
  • -march=armv8-a+crypto
  • -mcpu=cortex-a57+crypto

Thanks.