CUDA 7.5 RC debug bug -- shared mem atomicAdd()

Here’s a bizarre one…

Shared memory atomicAdd()'s that don’t assign the return value to a variable and are at the very end of a function body appear to be skipped.

Single-stepping in Nsight shows no update occurring.

Assigning (and then ignoring) the return value resolves the problem while debugging.

My environment is CUDA 7.5RC + VS2013 + Debug/Win7x64/353.45. I’m targeting compute_50/compute_50.

It took a couple hours to find this.

is the shared memory location local to the function, or is a pointer passed to the function?

only in the latter case would i begin to think of considering it as a bug