NVIDIA Developer Forums

optimize host code

Accelerated Computing CUDA CUDA Programming and Performance

xrismf June 16, 2011, 1:20am 1

Hello,

I’m using

--optimize 3

flag to optimize the host code in my cuda program, but although it enhances the performance the host code still executes 30% slower then in a seperate pure C++ project. Is there anything else I could try?

Kind Regards

xrismf June 16, 2011, 1:20am 2

Hello,

I’m using

--optimize 3

flag to optimize the host code in my cuda program, but although it enhances the performance the host code still executes 30% slower then in a seperate pure C++ project. Is there anything else I could try?

Kind Regards

njuffa June 16, 2011, 2:21am 3

Host code from .cu files is preprocessed by nvcc before it is passed to the host compiler, and this pre-processed host code seen by the host compiler will tend to differ from the same host code presented directly to the host compiler. In my experience any performance differences resulting from these source differences are minor, but I assume that it is not impossible for more pronounced performance differences to occur.

You could simply move the affected host code from the .cu file to a separate .cpp file that is compiled directly with the host compiler. You can also try passing additional optimization flags to the host compiler via the nvcc commandline. For example, if the host compiler is g++, you could try something like this:

-Xcompiler -O3 -Xcompiler -march=core2 -Xcompiler -mtune=core2 -Xcompiler -msse2

Which additional host compiler optimization flags make sense will depend on your code, the target platform, and the host toolchain.

[later:]

I assume you already verified that the flags passed to the host compiler are either identical, or at least essentially the same, between the nvcc build and the separate host compiler build? Adding the -v switch to the nvcc commandline will cause it to show exactly how each underlying tools is invoked.

njuffa June 16, 2011, 2:21am 4

Host code from .cu files is preprocessed by nvcc before it is passed to the host compiler, and this pre-processed host code seen by the host compiler will tend to differ from the same host code presented directly to the host compiler. In my experience any performance differences resulting from these source differences are minor, but I assume that it is not impossible for more pronounced performance differences to occur.

You could simply move the affected host code from the .cu file to a separate .cpp file that is compiled directly with the host compiler. You can also try passing additional optimization flags to the host compiler via the nvcc commandline. For example, if the host compiler is g++, you could try something like this:

-Xcompiler -O3 -Xcompiler -march=core2 -Xcompiler -mtune=core2 -Xcompiler -msse2

Which additional host compiler optimization flags make sense will depend on your code, the target platform, and the host toolchain.

[later:]

I assume you already verified that the flags passed to the host compiler are either identical, or at least essentially the same, between the nvcc build and the separate host compiler build? Adding the -v switch to the nvcc commandline will cause it to show exactly how each underlying tools is invoked.

Topic		Replies	Views	Activity
SOLVED? nvcc optimization options problem CUDA Programming and Performance	5	7170	July 15, 2010
nvcc -O3 problem CUDA Programming and Performance	7	8133	October 22, 2011
nvcc optimization flags CUDA Programming and Performance	6	20220	April 29, 2019
Cuda CPP vs C CUDA Programming and Performance	1	1410	February 25, 2017
How to get nvcc to pass optimization flags to g++ without getting in the way nvc, nvc++ and nvfortran	3	3387	August 10, 2020
Very slow host code when compiled through nvcc CUDA Programming and Performance	7	1101	December 10, 2022
Difference in Performance CUDA Programming and Performance	13	9756	August 20, 2008
How to do -O3 optimization in visual Studio for CUDA code CUDA Programming and Performance	6	7983	July 23, 2015
Compiling code with both Boost and Cuda CUDA Programming and Performance	13	23051	November 26, 2009
Using --optimize or -O with NVCC Looking for documentation CUDA Programming and Performance	2	8318	November 9, 2011