Different results in Global and Constant memory

Given all same host and kernel code, I have a little different output when I declare variable (actually array) in global memory (then pass it to kernel as argument) and when I declare it in constant memory (don’t need to pass argument).

The difference seems like just floating point error even though error gets larger since there are lots of calculations in kernel and this difference happens for 1% of total output (for example, around 100 elements in array of size 630000)

Any idea?

Of course, I tested both in release mode. Everything is same except where it is declared. Also, there is no issue on initialization (garbage value) because I copy the value from host to device after declaration.

Any possible error source are welcomed.

It’s possible that compiler generates assembly with different order of operations for global and constant memory cases. Mathematically (or with infinite precision), order of operations wouldn’t matter, but due to finate precision it can affect the result.

It would be great if you could post a small repro case, reducing it to a minimal amount of source code.