volatile is good for any intermediate result that you want assigned to a register immediately. Also for constant values that appear several times in following computations.
CUDA often wastes registers by computing the same stuff multiple times, let’s say you use the following array index several times in some code (for example several times inside a tight loop),
CUDA often inlines this computations into the PTX assembly and computes i*5+y multiple times into different target registers. It can be a waste.
volatile int index = i*5+y;
With the above code you would force CUDA to compute it and store it in a register before you enter your computation loop. Then you will use [index] inside the loop. That of course implies that i and y have to be constant within the loop ;)
The following is a good one also. Constants can also be put into a volatile variable, because otherwise CUDA likes to load the same constant over and over into new registers, even if is the very same constant.
Say you have some code like
foo = 1.0f + sin(x); bar = 1.0f - cos(x)
Instead use this.
volatile float one = 1.0;
foo = one + sin(x); bar = one - cos(x)
The above saves you one register inside the PTX, which often translates to one saved register in the .cubin as well.
In some cases the tricks I outlined above will cross the threshold to getting a better occupancy on the GPU, especially if it is just a few registers you are short.