I’m working on some simple program and I’m stumbling on the following situation. My program executes well when the kernel contains:
unsigned int td = d[tid];
for (unsigned int j = 0; j < 16; ++j) {
atomicMax (&(hit[2]), tid);
atomicMin (&(hit[0]), tid);
__syncthreads ();
td <<= 2;
td += (td_next >> 30);
td_next <<= 2;
}
But if the to atomic functions are put within an if, then I get “unspecified launch failure” when adding a CUT_CHECK_ERROR() just after the kernel launch:
unsigned int td = d[tid];
for (unsigned int j = 0; j < 16; ++j) {
if (td == 0) {
atomicMax (&(hit[2]), tid);
atomicMin (&(hit[0]), tid);
}
__syncthreads ();
td <<= 2;
td += (td_next >> 30);
td_next <<= 2;
}
Any idea on what might go wrong? Any insights on where to look?