How to get nvcc to pass optimization flags to g++ without getting in the way

barnabear2 · August 7, 2020, 2:54pm

Hi,

I’ve now managed to optimize my g++ output to be pretty much as fast as nvc++ output code for general c++ code (non gpu). But I don’t seem to be able to do the same with nvcc.

The result is that nvcc output is now running much slower than the g++ output (3982ms vs 2579ms).

Please see my (abridged) test output

=======================================
Testing nvcc
=======================================
nvcc -O3 --extended-lambda -I /opt/nvidia/hpc_sdk/Linux_x86_64/cuda/11.0/include -o main main.cpp

Running nvcc on main.cpp

100 iterations of 100000 samples with 100 inner loop iterations.
Elapsed time in nanoseconds : 3982115443 ns
Elapsed time in microseconds : 3982115 µs
Elapsed time in milliseconds : 3982 ms
Elapsed time in seconds : 3 sec

=======================================
Testing g++
=======================================
g++ -Ofast -march=native -std=c++17 -Wall -Wextra -pedantic -o main_no_policy main.cpp

100 iterations of 100000 samples with 100 inner loop iterations.
Elapsed time in nanoseconds : 2579204732 ns
Elapsed time in microseconds : 2579204 µs
Elapsed time in milliseconds : 2579 ms
Elapsed time in seconds : 2 sec

The key here for g++ was the options -Ofast -march=native . I don’t seem to be able to specify the same for nvcc. As I understand it nvcc is using g++ to compile the host code and it’s deeply frustrating not to be able to pass those options through.

Also the descriptions of the optimization levels in the nvcc documentation and output from nvcc --help seems to be missing. How do I specify the machine architecture to nvcc and what are the options? In this case I need the native architecture of the machine doing the compilation (Intel Coffee lake with supports AVX2 (since Haswell)).

Using my two command lines …

g++ -Ofast -march=native
nvcc -O3 --extended-lambda

How do I combine the g++ options into the nvcc command and get it to work?

Thanks,

Leigh

MatColgrove · August 7, 2020, 3:50pm

You can pass options to the host compiler (g++ in this case) via the “-Xcompiler” flag. See: NVIDIA CUDA Compiler Driver

barnabear2 · August 8, 2020, 2:21pm

Hi Mat,

Thanks for that, very helpful.

Best regards,

Leigh.

barnabear2 · August 10, 2020, 5:08pm

Hi Mat,

I tried -Xcompile -Ofast,-march=native and that did the trick. I also had to remove the -O9 flag that was already there.

I did find that nvcc was passing the -O9 through to g++, replacing -Ofast.

nvcc won’t accept -Ofast directly as it expects a number.

Now my nvcc code is running as fast as g++ output which is what I was aiming for.

Thanks for your help,

Leigh.

Topic		Replies	Views
converting gcc option to compatible nvcc option Having a trouble find option in nvcc CUDA Programming and Performance	1	5152	August 23, 2009
passing optimization options to gcc through nvcc CUDA Programming and Performance	0	2012	September 29, 2009
nvcc problem CUDA Programming and Performance	1	5796	January 28, 2008
Using --optimize or -O with NVCC Looking for documentation CUDA Programming and Performance	2	8364	November 9, 2011
How to config the parameter of the compile When the gcc and NVCC mixed-language compiling?? Jetson TX2	2	997	October 18, 2021
SOLVED? nvcc optimization options problem CUDA Programming and Performance	5	7257	July 15, 2010
Selecting host compiler with nvc++ nvc, nvc++ and nvfortran	3	1063	July 5, 2023
nvcc optimization flags CUDA Programming and Performance	6	20811	April 29, 2019
Difference in Performance CUDA Programming and Performance	13	9873	August 20, 2008
Options to optimize device code? CUDA Programming and Performance	2	1256	August 15, 2015

How to get nvcc to pass optimization flags to g++ without getting in the way

Related topics