simple addition bug +1 fails


I’ve just seen some strange behaviour with a simple arithmetic task and would like to know if anyone else knows this issue or maybe I’m doing someting wrong here.

The line in the code looks like this:

long long int n = blockIdx.x + (blockIdx.y*gridDim.x) + (n_iter*gridDim.x*gridDim.y) + 1;

maybe it’s too many operations in one line, although I thought the compiler should handle this. This line works perfect in Emulation Mode (that for it took me a lot of time to find the bug :glare: ). On the device it seems like the + 1 is not executed. If I rewrite it like

long long int n = blockIdx.x + (blockIdx.y*gridDim.x) + (n_iter*gridDim.x*gridDim.y);


it works also on the device.

My Hardware is a Tesla C870 and I’m currently using CUDA 2.0 beta2. I cannot post the whole code here and don’t like to waste time trying to extract a piece of code showing this behaviour. For I’ve found a workaround for this issue it is not that important any more but maybe someone else has seen this before and can give an explanation.



The problem is very likely the fact that you’re using 64 bit integer math.
Does this work if you make n an unsigned long instead?

There is GPU support for long long but its support is vague and indirectly mentioned in the programming guide. It requires compute device 1.3 hardware (ie a 260GTX or 280GTX)

Thanks for the quick reply, I expected something like that…

I do need the 64bit integers for the grid dimensions and n_iter might be growing really large at certain parameters. I’ve just stumbled across another issue that might be connected to the 64bit arithmetic, although its was more likely bad coding style. I will try to find another way to calculate the equations needed (it is an adress calculation and the n is only needed as an intermediate step), but for now I seem to have found a “stable” and predictable version of my code.

Well, lets hope the next Tesla won’t take too long to be available ;)