Hi all,
I have nvcc:
Built on Fri_Feb_19_18:18:31_PST_2010
Cuda compilation tools, release 3.0, V0.2.1221
and gcc (Ubuntu 4.3.3-5ubuntu4) 4.3.3
I need to use both nvcc and gcc with optimization flag -O3 and compare the performance of a “host only” code with the corresponding parallel “host + device” code.
The host only code is compiled with gcc -O3 …etc mycode.c
The host + device code is compiled with the makefile provided by CUDA SDK.
I tried setting both
NVCCFLAGS
#COMMONFLAGS #CFLAGS
to -O3
but there seems not to be any improvement in the performance, while the the host code was boosted by 20%.
What am I doing wrong?
I read in [topic=“159185”]this topic[/topic] that there could be some problem between gcc 4.3.4 and cuda toolkit 3.0beta.
Any help or advice would be really appreciated, :rolleyes:
Martina
Compiler optimizations for device code are already turned on by default unless you explicitly disable them with -O0. So it’s not too surprising to me that -O3 doesn’t buy you anything.
As for knowing why your device code isn’t as fast as you expect it to be, it’s pretty much impossible to say without knowing what your code is trying to do. Posting the actual kernel(s) would be helpful.
gcc 4.3.4 does work with the 3.0 final version. With CUDA 3.1, gcc 4.4 is supported as well.
I guess I was not very clear… The problem is that if I compile the same “host” code with the Makefile provided by CUDA SDK and with “gcc -O3” I obtain really different performances…
you’re right, but I guess here my problem is at compile time of the host code.