break-statement in kernels

Here’s a kernel that behaves very strange.

If i execute the kernel pasted below, my resulting values in out are (0,0), (2,0), (4,0), … (0,2) … everything fine.

If i uncomment the line pos = nextpos;, the y component of the resulting values in out are set to zero.

If i replace the break; statement by iter = 1; everything is fine again.

Have i missed something essential here?

void __global__ kernelConverge(short2* out, short width)

{

  short2 pos;

  pos.x = (threadIdx.x % (width/2)) * 2;

  pos.y = (threadIdx.x / (width/2)) * 2;

  short2 nextpos = pos;

 unsigned char iter = 1;

  while(iter != 0)

  {

    if ((nextpos.y == pos.y) && (nextpos.x == pos.

    {

      break;

    }

   //pos = nextpos;

    iter--;

  }

  out[threadIdx.x] = pos;

}

Would you be willing to post the support code that you are using to drive the kernel? In other words, a self contained compilable example?

Sorry for the delay! Here’s is the full example.

The kernel should always jump out of the loop with the break statement immediately because nextpos was set to pos. The result is ok for the code pasted below. But if you uncomment line pos = nextpos;, the y components of the resulting short-ints are set to zero.

void __global__ kernelConverge(short2* out, short width)

{

 short2 pos;

 pos.x = (threadIdx.x % (width/2)) * 2;

 pos.y = (threadIdx.x / (width/2)) * 2;

 short2 nextpos = pos;

unsigned char iter = 1;

 while(iter != 0)

 {

   if ((nextpos.y == pos.y) && (nextpos.x == pos.x))

   {

     break;

   }

  //pos = nextpos;

   iter--;

 }

 out[threadIdx.x] = pos;

}

int main(int argc, char* argv[])

{

  const int SeekerCount = 144;

  const int Width = SeekerCount / 16;  //9

  const int ByteSize = sizeof(short2) * SeekerCount;

  short2* data = (short2*)malloc(ByteSize);

 short2* dataDev;

  if (cudaMalloc((void**)&dataDev, ByteSize) != cudaSuccess)

    exit(-1);

 dim3 blocks(1, 1, 1);

  dim3 threads(SeekerCount, 1, 1);

 kernelConverge<<<blocks, threads>>>(dataDev, Width);

 if (cudaMemcpy(data, dataDev, ByteSize, cudaMemcpyDeviceToHost) != cudaSuccess)

    exit(-1);

 cudaFree(dataDev);

 free(data);

}