I’ve just seen some strange behaviour with a simple arithmetic task and would like to know if anyone else knows this issue or maybe I’m doing someting wrong here.
The line in the code looks like this:
long long int n = blockIdx.x + (blockIdx.y*gridDim.x) + (n_iter*gridDim.x*gridDim.y) + 1;
maybe it’s too many operations in one line, although I thought the compiler should handle this. This line works perfect in Emulation Mode (that for it took me a lot of time to find the bug :glare: ). On the device it seems like the + 1 is not executed. If I rewrite it like
long long int n = blockIdx.x + (blockIdx.y*gridDim.x) + (n_iter*gridDim.x*gridDim.y); n++;
it works also on the device.
My Hardware is a Tesla C870 and I’m currently using CUDA 2.0 beta2. I cannot post the whole code here and don’t like to waste time trying to extract a piece of code showing this behaviour. For I’ve found a workaround for this issue it is not that important any more but maybe someone else has seen this before and can give an explanation.