potentially bug: discrepancy in control flow b/w debug and non-debug versions

I’ve got the following GPU kernel.

__shared__ int fd;

__global__ void bug()

{

        fd=-1;

        __syncthreads();

        if (!threadIdx.x ) {

                fd=0;

/*x*/           if (fd<0) { return;}

        }

        __syncthreads();

        if (fd<0) asm("trap");

}

Obviously the trap should never get executed. However when compiled with --device-debug=3 it gets executed. If I remove the line marked /**/ it starts working correctly.

I’m using the latest compiler. It seems that the beta release had this bug even without --device-debug option.

nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2011 NVIDIA Corporation

Built on Thu_Jan_12_14:41:45_PST_2012

Cuda compilation tools, release 4.1, V0.2.1221

Have a look at my signature.

Not sure it’s applicable in this case - __syncthreads is evaluated outside of the conditional statement, and the return statement should never be invoked in any valid execution of the code.

Could you please file a bug and/or provide a full repro?

Oops, sorry yes, you are right. I’ve just become so used to the compiler working this way that I forgot it is against the specification. As Massimilano wrote, please file a bug report - that way we have a chance of either the behavior being fixed or the documentation getting updated.