Problems when I use -O3

dcd16043 · February 9, 2010, 9:00pm

Hello.

I have some problems when I use -O3 in my cuda program. I have simplified the problem to this code:

This is the main file:

[codebox]

#include <stdio.h>

#include “prueba_kernel.cu”

int main(){

dim3 grid (10);

dim3 block (128);

double * array_d, * array_h;

int i;

array_h = (double ) malloc (10sizeof(double));

cudaMalloc((void **)&array_d, 10*sizeof(double));

for (i=0; i<100; i++){

  prueba_kernel<<<grid, block>>>(array_d, i);

  cudaMemcpy(array_h, array_d, 10*sizeof(double), cudaMemcpyDeviceToHost);

  printf("IT: %d, %lf\n", i, array_h[0]);

}

cudaFree((void *)array_d);

free((void *) array_h);

}

[/codebox]

This is the _kernel.cu file:

[codebox]

global void prueba_kernel(double * array_d, int i){

unsigned int gid = (blockIdx.x * blockDim.x + threadIdx.x);

if (gid < 10){

  array_d[gid] = i;

}

[/codebox]

And I use this to compile:

[codebox]

nvcc -O3 -arch sm_13 prueba.cu -o prueba

[/codebox]

The result is (when I execute the kernel several times):

[codebox]

IT: 0, 99.000000

IT: 1, 1.000000

IT: 2, 2.000000

IT: 3, 3.000000

…

IT: 98, 98.000000

IT: 99, 99.000000

[/codebox]

gcc (Debian 4.3.4-6) 4.3.4 and cuda toolkit 3.0beta

Any help?

Thank you so much!

avidday · February 9, 2010, 9:34pm

I can confirm what you are seeing but I can’t explain why. The optimization flags are making no difference to the PTX the compiler is generating, which is really weird…

dcd16043 · February 10, 2010, 5:43pm

Thank you!

Yes, it’s really strange… Maybe it’s due to gcc…
I’m trying to unroll the first iteration, and then I don’t have the problem, but if the loop is more complicated this solution doesn’t work :(
Nobody can explain us what’s happening?

mfatica · February 10, 2010, 6:55pm

It works fine for me on both CUDA 2.3 and 3.0 on RHEL4.

dcd16043 · February 10, 2010, 7:04pm

What gcc version do you use?

avidday · February 10, 2010, 7:07pm

Must be a Ubuntu specific gcc thing…

mfatica · February 10, 2010, 7:15pm

On RHEL4, gcc 3.4.6.
I also tried RHEL5, gcc 4.1.2, it works fine also there.

You could try this:

remove the include of the kernel
nvcc -O3 -arch sm_13 -c prueba_kernel.cu
gcc -O3 prueba.c prueba_kernel.o -L/usr/local/cuda/lib64 -lcudart

dcd16043 · February 10, 2010, 7:41pm

Thank you! But gcc can’t find some things:

prueba.c:6: error: â€˜dim3â€™ undeclared (first use in this function)

…

prueba.c:18: error: â€˜cudaMemcpyDeviceToHostâ€™ undeclared (first use in this function)

…

mfatica · February 10, 2010, 8:07pm

You will need to include cuda_runtime.h.
You cannot use the <<< >>> syntax in .c, so you will need to move all the kernel launch to the .cu file or use the driver API.

dcd16043 · February 10, 2010, 8:59pm

Ok, now it works, but… I can’t do this kind of changes in my real program!!
If it works now, is the problem in nvcc? Hmmm, the program now is quite different. Maybe the compiler does a different optimization… I’m using a function with the kernel lauch in the .cu file.

What do you thing?

Sylvain_Collange · February 10, 2010, 11:21pm

I confirm the problem on Debian x64 with gcc 4.3.4 and CUDA Toolkit 3.0 beta.
When the kernel is launched for the first time, cudaSetupArgument ends up being called with some ridiculous value in the ‘offset’ argument (0x7fffffffe1d8), and the launch fails with the error “invalid argument” (please check the return values!).

Looks like a problem between some recent gcc optimizations and the CUDA Runtime. Maybe some ABI mismatch, or just a gcc bug?..

Sarnath · February 11, 2010, 5:34am

Syl,

Does “volatile dim3 …” help?

dcd16043 · February 11, 2010, 7:41am

Interesting…
If I install another gcc version, how can use it with nvcc? Is there any enviroment variable?

Thank you.

Topic		Replies	Views
SOLVED? nvcc optimization options problem CUDA Programming and Performance	5	7264	July 15, 2010
nvcc -O3 error Ubuntu 8.10 AMD64/64-bit -O3 error CUDA Programming and Performance	0	4031	January 27, 2009
nvcc -O3 problem CUDA Programming and Performance	7	8266	October 22, 2011
CUDA 2.3 bug? Strange compilation issue CUDA Programming and Performance	0	1860	September 5, 2009
compile with command line CUDA Programming and Performance	1	3177	September 18, 2008
Strange -O3 optimization result for nvfortran nvc, nvc++ and nvfortran	2	596	July 22, 2021
Cuda code throws an exception at O3 CUDA NVCC Compiler	1	442	January 30, 2023
How to do -O3 optimization in visual Studio for CUDA code CUDA Programming and Performance	6	8142	July 23, 2015
NVCC optimize level CUDA Programming and Performance	0	5529	November 6, 2009
nvcc -O0 not working (CUDA 3.2) CUDA Programming and Performance	9	16506	December 16, 2011

Problems when I use -O3

Related topics