Potential bug with 3D memcpy in Linux 64-bit Ubuntu (CUDA 2.1)

Hi all,

I encounter a strange 3D memcopy problem on Ubuntu 64bit (CUDA 2.1) while executing a well tested code that runs on OS X 10.5 i386 (CUDA 2.0)

Platform details:
uname -ims && cat /etc/*release
Linux x86_64 unknown

03:00.0 VGA compatible controller: nVidia Corporation G80 [GeForce 8800 GTX] (rev a2)

I copy an NxNxN 3D floating point volume from GPU memory after writing a constant value (+1.0) to each voxel of the volume in a kernel.

The host volume shows parts of the volume that is not written to (contains zeros in rectangular blocks inside the volume. host volume is memset to zero at allocation).

I’ve tested my functions and the simple cuda kernel (that writes the constant value) quite well and it works fine on my OS X machine that has an nVIDIA 8600M GT card.

Of course the compute capabilities of the GPUs are different and also the CUDA versions, but should that make any difference?
I’m attaching a very small sample program that can reproduce the bug. Attached are exactly the same codes (but with their respective ptx and other intermediate codes) on both the platforms. The resulting volume is written to a file “raw.out” and it can be seen that though the desired output is an array of -1.0 on the border of the volume and a value of -1.0 inside. The Ubuntu executable produces a defective output with 0.0 in between the volume in rectangular subvolumes.

Any help with this will be highly appreciated.

volwrite_bug_ubuntu8.04.tar.gz (434 KB)
volwrite_bug_mscosx10.5.tar.gz (375 KB)

Not to mention that the results are OK with emulation mode.

N.B.: I understand that this post should have been in the Linux section, but it is perhaps little late to make a change and I don’t want to post twice.


Is anyone facing such a problem with the 64 bit arch Linux? I guess this is not GPU specific.


Yes, I have also encountered inconsistent behavior with cudaMemset3d using CUDA 2.1 and Ubuntu 8.10. It was filling most of the elements of the 3D array with the correct value, but leaving a large slab at the end with incorrect values. My work around was to use the 1D cudaMemset function instead.