The CUDA C we generate are identical between the two examples, with the exception of volatile attribute (See the output from -Mcuda=keepgpu). There may be a problem with the NVIDIA back-end compiler or an issue with your code. Though, we would need to see a full reproducing example to determine which. Can you put one together?
Thanks Tuan. Turns out that it is our error but only occurs when optimisation is applied. When I went back and recompiled your small example with “-O2”, then I saw the bug.
The problem is that the compiler is performing constant propagation and replaces the “ryr_ct” variable with the constant value “1010”. When “volatile” is used, the reference to “ryr_ct” in “compPdt_dev(ii,1) = ryr_ct + 10” is correctly preserved, however, the assignment “ryr_ct = 1000” is still removed. I have sent a report to our engineers (TPR#18530) to be fixed.