SOLVED? nvcc optimization options problem

Hi all,
I have
Built on Fri_Feb_19_18:18:31_PST_2010
Cuda compilation tools, release 3.0, V0.2.1221

gcc (Ubuntu 4.3.3-5ubuntu4) 4.3.3

I need to use both nvcc and gcc with optimization flag -O3 and compare the performance of a “host only” code with the corresponding parallel “host + device” code.
The host only code is compiled with gcc -O3 …etc mycode.c
The host + device code is compiled with the makefile provided by CUDA SDK.
I tried setting both


to -O3

but there seems not to be any improvement in the performance, while the the host code was boosted by 20%.
What am I doing wrong?
I read in [topic=“159185”]this topic[/topic] that there could be some problem between gcc 4.3.4 and cuda toolkit 3.0beta.

Any help or advice would be really appreciated, :rolleyes:

Compiler optimizations for device code are already turned on by default unless you explicitly disable them with -O0. So it’s not too surprising to me that -O3 doesn’t buy you anything.

As for knowing why your device code isn’t as fast as you expect it to be, it’s pretty much impossible to say without knowing what your code is trying to do. Posting the actual kernel(s) would be helpful.

gcc 4.3.4 does work with the 3.0 final version. With CUDA 3.1, gcc 4.4 is supported as well.


first of all, thank you so much for your answer.

I guess I was not very clear… The problem is that if I compile the same “host” code with the Makefile provided by CUDA SDK and with “gcc -O3” I obtain really different performances…

you’re right, but I guess here my problem is at compile time of the host code.

Thank you in advance,


In CUDA SDK/C/common/, I tried to set both,





CFLAGS := -O3[/b]

by setting optimization flags individually.

The execution time for my program is in both cases around 60s.

Since I did not get any improvement by using the -O3 flag, I tried to compile the .c and .cu files separately

[codebox]nvcc -O3 -I/usr/local/cuda/include -I/home/martina/software/CUDA_SDK/C/common/inc -L/usr/local/cuda/lib -L/ -o myfileDevice.o -c

g++ -O3 -I/usr/local/cuda/include -I/home/martina/software/CUDA_SDK/C/common/inc -o myfileHost.o -c myfileHost.c[/codebox]

then i linked them together

[codebox]g++ -O3 -fPIC -o finalExe myfileHost.o myfileDevice.o -L/usr/local/cuda/lib -L/home/martina/software/CUDA_SDK/C/lib -lcuda -lcudart -lm -lcutil-lcufft


In this case my execution time dropped from 60 s to 34s!

This is huge difference… Am I not seeing some fundamental flags in the makefile (

Thank you all in advance,


Try adding the following to NVCCFLAGS:

–compiler-options “-O3”

This should pass through the -O3 to the host compiler.

Sorry for the late reply.

Thank you Cliff, now I get almost the same execution times!