atomicCAS writes are not visible immediately to other threads

Hi all,

I have a kernel that performs an atomicCAS on pointers to dynamic memory in global memory (allocated using malloc in kernel).
It seems that atomicCAS writes are not immeduately visible to other threads. I even use __threadfence() after the call to atomicCAS.

I’m using Fermi architecture and CUDA 3.2.

I read in this forum that atomic operations forces writes to global memory and avoid caching. Is that true? How reliable is this?
What about the read? For example in atomicCAS does the read operation for comapring the first and second args is done directly from global memory?

It seems that when I allocate only 1 thread to each thread-block, I see very few failures in my code that may have been caused because of the visibility issue.

How can I solve this visibility issue in Fermi?

Thanks much.

I’d suspect the problem on the read side (caches are not coherent). Have you declared the variable as volatile?

No it is not declared as volatile right now. I’m not exactly sure if I can use volatile variables with atomicCAS due to compiler errors?

How can I use volatile variables with atomicCAS?

To be clear, is using volatile variables going to solve the read problem?

Thanks for the fast reply!

The solution is considered ugly by some, but it is what you need to do.

Another option that brings strong guarantees with it would to also do the read with an atomic instruction (e.g. [font=“Courier New”]atomicCAS([/font]…[font=“Courier New”], 0, 0)[/font]). However that would be complete overkill as the expensive write part of the atomic operation isn’t needed.