pointer arithmetic bug in CUDA-1.1 dynamic pointer alignment failes


The attached code demonstrates a bug in the pointer arithmetic handling in CUDA-1.1. Depending on whether an intermediate result is written to a global memory location or not, the final result of the pointer computation is different. This seems to be related to the optimizer, which doesn’t handle the size of the data type correctly (if “char” is used, the bug is not evident since sizeof(char)==1).

The purpose of this piece of code is to align a dynamically computed pointer for efficient access to global memory. Is there a better way to do this?

Thanks & kind regards,

P.S.: system information:

Linux #1 SMP Tue Oct 2 17:21:08 UTC 2007 x86_64 GNU/Linux
Intel® Core™2 CPU 6400 @ 2.13GHz, 2GB RAM
(II) NVIDIA(0): NVIDIA GPU GeForce 8800 GTX (G80) at PCI:1:0:0 (GPU-0)

nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2006 NVIDIA Corporation
Built on Fri_Nov_30_09:44:36_PST_2007
Cuda compilation tools, release 1.1, V0.2.1221


gcc (GCC) 4.1.2 20061115 (prerelease) (SUSE Linux)
pointer_bug_demo.tar.gz (649 Bytes)

What is the actual results (bug) versus the expected results?

On my particular installation these are:


0x11000e00 0x11000e00


0x11000e00 0x4400380

The numbers may vary of course, but the pointers are expected to be identical. What I get instead is the second value being the first one divided by 4 (= sizeof(int)).

Kind regards,