How to get nvcc to pass optimization flags to g++ without getting in the way

Hi,

I’ve now managed to optimize my g++ output to be pretty much as fast as nvc++ output code for general c++ code (non gpu). But I don’t seem to be able to do the same with nvcc.

The result is that nvcc output is now running much slower than the g++ output (3982ms vs 2579ms).

Please see my (abridged) test output

=======================================
Testing nvcc
=======================================
nvcc -O3 --extended-lambda -I /opt/nvidia/hpc_sdk/Linux_x86_64/cuda/11.0/include -o main main.cpp

Running nvcc on main.cpp

100 iterations of 100000 samples with 100 inner loop iterations.
Elapsed time in nanoseconds : 3982115443 ns
Elapsed time in microseconds : 3982115 µs
Elapsed time in milliseconds : 3982 ms
Elapsed time in seconds : 3 sec

=======================================
Testing g++
=======================================
g++ -Ofast -march=native -std=c++17 -Wall -Wextra -pedantic -o main_no_policy main.cpp

100 iterations of 100000 samples with 100 inner loop iterations.
Elapsed time in nanoseconds : 2579204732 ns
Elapsed time in microseconds : 2579204 µs
Elapsed time in milliseconds : 2579 ms
Elapsed time in seconds : 2 sec

The key here for g++ was the options -Ofast -march=native . I don’t seem to be able to specify the same for nvcc. As I understand it nvcc is using g++ to compile the host code and it’s deeply frustrating not to be able to pass those options through.

Also the descriptions of the optimization levels in the nvcc documentation and output from nvcc --help seems to be missing. How do I specify the machine architecture to nvcc and what are the options? In this case I need the native architecture of the machine doing the compilation (Intel Coffee lake with supports AVX2 (since Haswell)).

Using my two command lines …

g++ -Ofast -march=native
nvcc -O3 --extended-lambda

How do I combine the g++ options into the nvcc command and get it to work?

Thanks,

Leigh

You can pass options to the host compiler (g++ in this case) via the “-Xcompiler” flag. See: https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#options-for-passing-specific-phase-options

Hi Mat,

Thanks for that, very helpful.

Best regards,

Leigh.

Hi Mat,

I tried -Xcompile -Ofast,-march=native and that did the trick. I also had to remove the -O9 flag that was already there.

I did find that nvcc was passing the -O9 through to g++, replacing -Ofast.

nvcc won’t accept -Ofast directly as it expects a number.

Now my nvcc code is running as fast as g++ output which is what I was aiming for.

Thanks for your help,

Leigh.