I get the same (correct) result either way, compiling for a cc2.0 device. What OS are you using? What is your compile command? What is your CUDA version?
My guess is you are on windows, and are compiling for a cc1.0 - cc1.2 architecture, which is giving you a message when you compile:
warning : Double is not supported. Demoting to float.
In that case, there is a difference between the 2 formulations.
If you compile for cc1.3 or newer architecture, you will not get that message, and you will see no difference numerically between the two cases.
shows how to modify the code generation target. (Don’t use the method indicated in the question - it is wrong. That is why the question was asked. Use the method indicated in the answer given by RoBiK.)
Change that setting to “compute_30,sm_30”
And if you want to remove the demotion warning, remove any other entries, if there are any.