Arithmetic bug in CUDA 2.1 most probably it is a bug related to the optimization of the code

The attached code has a single kernel that contains three nested loops. The loops enclose and update in memory. The address is computed based on the loops indices and a set of other variables. To optimize the kernels I replaced the direct computation of the memory address by a set of simpler math surrounding the iterations. The two computations should give similar values for the final address to be adapted. The two numbers are identical in emulation mode but they are no in real mode.


Ubuntu 8.04
CUDA 2.1
bug_report.tar.gz (128 KB)

try with CUDA 2.1 final and 180.22 before you report this; there are a lot of fixes in 2.1 final.

Yes, I should have tried that. I tried it now anyway with and the bug is still there thank you for your prompt response.

I forgot to say that I tried it with CUDA 2.1 final too with the new driver.