Given all same host and kernel code, I have a little different output when I declare variable (actually array) in global memory (then pass it to kernel as argument) and when I declare it in constant memory (don’t need to pass argument).
The difference seems like just floating point error even though error gets larger since there are lots of calculations in kernel and this difference happens for 1% of total output (for example, around 100 elements in array of size 630000)
Any idea?
Of course, I tested both in release mode. Everything is same except where it is declared. Also, there is no issue on initialization (garbage value) because I copy the value from host to device after declaration.
Any possible error source are welcomed.