I’ve got the following GPU kernel.
__shared__ int fd;
__global__ void bug()
{
fd=-1;
__syncthreads();
if (!threadIdx.x ) {
fd=0;
/*x*/ if (fd<0) { return;}
}
__syncthreads();
if (fd<0) asm("trap");
}
Obviously the trap should never get executed. However when compiled with --device-debug=3 it gets executed. If I remove the line marked /**/ it starts working correctly.
I’m using the latest compiler. It seems that the beta release had this bug even without --device-debug option.
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2011 NVIDIA Corporation
Built on Thu_Jan_12_14:41:45_PST_2012
Cuda compilation tools, release 4.1, V0.2.1221