Hmm those guys weren’t sure what it does… when volatile is placed in front of locals… or maybe they were… but now I have enough reason to doubt it… at least for the locals…
The test kernel I wrote isn’t compute intensive and probably not register intensive so it doesn’t have a performance effect.
The ptx code did become longer though with all these volatiles… that’s probably not so good.
The register usage did go up which could confirm my hypothesis that compiler will start using more registers, this goes against what somebody else wrote.
Somebody else wrote: volatile might always read/write directly from memory… but that’s probably not possible with cuda since cuda needs a register to load stuff into and write stuff from… I am also not completely sure about that last statement… maybe there are other instructions which can load/write from/to global memory as well.
For now I am sticking with what ptx manual wrote about it: “volatile prohibits cache operations” which I think means it by-passes cache operations.
Any other effect is probably a side effect and shouldn’t be relied on.
The question is: what is considered “cache” is a register considered cache too ?
I don’t think so… registers are something by themselfes and are not considered cache ?!..
One other guy was nagging about C language giving a different meaning to volatile… hmmmm… well I am not a true C programmer, so I am not sure about that… and for me it doesn’t matter.
Whatever the cuda doc says and the ptx doc says and the ptx output says goes with me ! :)