Volatile - when to use? (Regarding registers)

I’m aware of the ‘volatile trick’ to reduce register usage in certain situations.

(See: Topic 1, Topic 2 )

However, I’m slightly confused as to when using the volatile keyword will increase performance, and when using it will decrease performance. (Assume we have 100% occupancy already).

My understanding is the volatile keyword forces the variable to be stored in a register and then every time the variable is used in the code it is fetched from the register. If a variable isn’t declared volatile then it may or may not be inlined or stored in a register. Is this correct?

Basically, I’ve seen people saying that using volatile on every variable will result in a performance hit. How do I know which variables will benefit from being volatile?

Is it the ones that are used the most? If so, how many times does the variable have to be used to gain any benefit from being volatile?

Or will variables that are computed from a global read (i.e. float tmp = A[i]; where A is a global array) always benefit from being volatile?

Lot’s of questions I know ;) There just doesn’t seem to be much concrete info on this…

hmm that not how I read it. I think volitile tells the compiler it is not to optimise
variables (especially shared memory) by placing them in registers because another thread
may update the variable. (The update would be ignored if the register was used instead).
I ended up using volitile on all shared memory because I could never be sure that replacing
it with a register was safe (after all I was using shared memory inorder to communicate between
threads). In my view any small performance gain by risking the compiler doing the wrong thing is not worth the debugging effort. Use volitle on all pointers to shared memory.

Suppose I am implementing a global worklist - I add to that worklist using the list tail and read (delete)from work list using Head of list.

None of my threads re-read any particular location of list however I do re-read Tail and Head values repeatedly. Should I be define these global variable (tail and head) as volatile?

In that case on the CPU code side - how do i use cudamemcpy or Atomics on them -? By type casting?


volatile should be used when the data can be changed outside the current thread without memory fences (for writes) or synchronization (for reads and writes). Otherwise the compiler is free to optimize the reads/writes to the variable by caching the data in a local register. In particular if you access your shmem data only after __syncthreads there is no need to use volatile.

In your example, if you synchronize the accesses to the list via some global atomic variable, and then change the tail/head from different threads without __synchtreads, you MUST declar head/tail as volatile since otherwise the updates will not be properly read by other threads.

If you worklist is on a GPU and you modify the items from the CPU, then it’s a whole different story and volatile is insufficient, you have to use PTX assembly to read through to bypass the cache.