atomicCAS Warp Invalid address.

I am trying to use atomicCAS to substitute for an atomicAdd double. I am getting the following error CUDA_EXCEPTION_7, Warp Invalid Address Space. When I read the documentation on atomicCAS it says the value to be compared and switched, the first parameter, can be in either shared or global memory.

In help in understanding this would be appreciated.

Are you using atomicCAS() on shared memory? In that case it is important the compiler can deduce this is the case, and that all (active) threads use shared memory.

I am using shared memory in the kernel but not in the atomicCAS call.

One other question, in your footer you mention Do not use __syncthreads() in conditional code…, I am trying to implement a sum reduction algorithm. Most if not all of these algorithms have a section that compiles all of the information from the individual blocks, is this problematic or am I missing something.

if(i < s) {
array[i] += array[i+s];

Just as the footer says: If i and s have identical values for all threads in the block, or if there is some other invariant that guarantees that i < s gives the same value for all threads, you are safe. Otherwise, move the __syncthreads() out of the conditional section.