vanishing loop? it was there, then it wasn't

The code below works fine in the emulator but breaks in device mode. The code should copy samples from one buffer to another.

In device mode, if the if’ed section is disabled, the loop copies the first element of the source buffer across the output buffer (correct in that case). If the if’ed section is enabled the whole loop appears not to execute, the contents of the output buffer are unchanged.

I realize that this a short code excerpt, but does anyone have an idea? This is one of those problems that would take a lot effort to extract into a standalone case.



   Resolution samplex = 0;

    Resolution sampley = 0;

   for (Size pixely=region[2]+threadIdx.y;pixely<region[3];pixely+=blockDim.y) {

      for (Size pixelx=region[0]+threadIdx.x;pixelx<region[1];pixelx+=blockDim.x) {


        Sample s;

#if 1

        samplex = ((pixelx-region[0])*pixel_size) / sample_size_x;

        sampley = ((pixely-region[2])*pixel_size) / sample_size_y;


       s = framelet_buffer[(samplex+(sampley*w))]; 

       frame_buffer[pixelx+(pixely*width)] = s;




What happens before the loops? Is there shared memory involved? Did you forget a __syncthreads() after loading data into the shared memory (only needed if threads read shared memory written by other threads).


I was able to eliminate this behavior by running with fewer threads, so the extra two lines were likely causing the kernel to run out of registers etc. Not sure why I didn’t see a launch failure error however.