Cuda 4.1 broke my kernel Upgraded from 4.0 to 4.1

Dave85 · January 30, 2012, 12:15am

Hi,

I just updated to cuda version 4.1.28, updating the following products / drivers:

cuda took kit 4.1.28
dev driver windows 7 64 286.16
gpu sdk 4.1.28
parallel nsight 2.1

Since updating one of my kernels keeps failing with just rubbish output, where it is not displaying any errors or the cuda runtime does not report any errors. The rest of my kernels run fine, it is just one that is causing an issue, which had ran flawlessly in cuda 4.0.17.

I did some testing and was able to get the kernel to produce the correct result, by enabling the debug flag -G0.

Anyone got any ideas on the cause of this?

Regards,

Dave

hyqneuron · January 30, 2012, 1:49am

You can make a repro and do a bug report :)

Of course you may also show us the code of that kernel.

Dave85 · January 30, 2012, 2:25am

The code is currently of the sensitive nature. I will try narrow down the problem and post some code if I can. In the mean time anyone got a suggestion?

cbuchner1 · January 30, 2012, 11:38am

compiling your kernel for SM 1.2 or 1.3 will used the older Open64 based compiler still, whereas SM 2.0 and 2.1 will use the new LLVM based compiler (starting with CUDA 4.1)

I wonder if nVidia provides any option to also use Open64 for SM 2.x ?

Christian

Gregory_Diamos · January 30, 2012, 9:56pm

It does. Run ‘nvcc … --nvvm’ to force it to use llvm, and ‘nvcc … --open64’ to force it to use open64.

Dave85 · January 30, 2012, 10:25pm

Using the compiler option --open64 to force it to use open64 has fixed the problem.

In a nut shell what is the difference between the llvm and open64 compiler?

njuffa · January 30, 2012, 10:30pm

They entirely separate front ends. I would suggest filing a bug against the compiler, attaching a small self-contained repro case. Thank you for your help.

Dave85 · January 30, 2012, 11:19pm

I now have actually fixed the problem, where it works now with both compiler options.

Previously this is what I was doing the following, which worked with open64 not llvm

....

if(threadIdx.x == 0)

    result = someFunction(data,size,buffer); // result and data are stored in global memory

else

    someFunction(data,size,buffer); 

....

__device__ float someFunction(float *data, int size, float *buffer)

{

// big calculation using the global data input

for(int i = threadIdx.x; i < size; i+= blockDim.x)

   buffer[i] = data[i] * ...

__syncthreads();

// finalise the result in one thread

if(threadIdx.x == 0)

{

   float result = ...

   return result;

}

else

{

   return 0;

}

}

After reviewing how I did the code I relised it was ugly and not the correct way, in which I changed it to the following, which now works in both the llvm and open64 options.

....

    result = someFunction(data,size,buffer); // result and data are stored in global memory

....

__device__ float someFunction(float *data, int size, float *buffer)

{

// big calculation using the global data input

for(int i = threadIdx.x; i < size; i+= blockDim.x)

   buffer[i] = data[i] * ...

__syncthreads();

__shared__ float result;

// finalise the result in one thread

if(threadIdx.x == 0)

{

   result = ...

}

__syncthreads();

return result;

}

Thanks for the help guys, I think the problem was all my dodgy coding style. This kernel was one of my first kernels I had written and was a little bit of a hack and slash approach.

Topic		Replies	Views
Problem with CUDA release 4.1, using default LLVM compiler CUDA Programming and Performance	0	1006	February 12, 2012
Issue between 4.0 and 4.1 CUDA Programming and Performance	0	755	April 27, 2012
Problem with CUDA release 4.1, using default LLVM compiler CUDA Programming and Performance	7	5474	February 25, 2012
CUDA 4.1 RC2 is now available CUDA Programming and Performance	11	3025	December 14, 2011
CUDA v4.1 substantially slower than v4.0 CUDA Programming and Performance	10	18261	February 12, 2012
CUDA4.1 + GCC 4.6 CUDA Programming and Performance	5	3793	April 21, 2012
CUDA 4.1 vs. 3.2 register allocation... CUDA Programming and Performance	6	1542	April 24, 2012
CUDA in 64-bit Linux CUDA Programming and Performance	6	12310	May 14, 2008
CUDA 2.1 and g++ Incompatibilites with CUDA 2.1 and g++ 4.3 CUDA Programming and Performance	3	5205	April 30, 2009
GCC support in upcoming CUDA 4.1 CUDA Programming and Performance	6	2830	January 16, 2012

Cuda 4.1 broke my kernel Upgraded from 4.0 to 4.1

Related topics