The attached code demonstrates a bug in the pointer arithmetic handling in CUDA-1.1. Depending on whether an intermediate result is written to a global memory location or not, the final result of the pointer computation is different. This seems to be related to the optimizer, which doesn’t handle the size of the data type correctly (if “char” is used, the bug is not evident since sizeof(char)==1).
The purpose of this piece of code is to align a dynamically computed pointer for efficient access to global memory. Is there a better way to do this?
Thanks & kind regards,
P.S.: system information:
Linux 126.96.36.199-0.7-default #1 SMP Tue Oct 2 17:21:08 UTC 2007 x86_64 GNU/Linux
Intel® Core™2 CPU 6400 @ 2.13GHz, 2GB RAM
(II) NVIDIA(0): NVIDIA GPU GeForce 8800 GTX (G80) at PCI:1:0:0 (GPU-0)
nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2006 NVIDIA Corporation
Built on Fri_Nov_30_09:44:36_PST_2007
Cuda compilation tools, release 1.1, V0.2.1221
CUDA SDK 1.1
gcc (GCC) 4.1.2 20061115 (prerelease) (SUSE Linux)
pointer_bug_demo.tar.gz (649 Bytes)