I encounter a strange 3D memcopy problem on Ubuntu 64bit (CUDA 2.1) while executing a well tested code that runs on OS X 10.5 i386 (CUDA 2.0)
uname -ims && cat /etc/*release
Linux x86_64 unknown
03:00.0 VGA compatible controller: nVidia Corporation G80 [GeForce 8800 GTX] (rev a2)
I copy an NxNxN 3D floating point volume from GPU memory after writing a constant value (+1.0) to each voxel of the volume in a kernel.
The host volume shows parts of the volume that is not written to (contains zeros in rectangular blocks inside the volume. host volume is memset to zero at allocation).
I’ve tested my functions and the simple cuda kernel (that writes the constant value) quite well and it works fine on my OS X machine that has an nVIDIA 8600M GT card.
Of course the compute capabilities of the GPUs are different and also the CUDA versions, but should that make any difference?
I’m attaching a very small sample program that can reproduce the bug. Attached are exactly the same codes (but with their respective ptx and other intermediate codes) on both the platforms. The resulting volume is written to a file “raw.out” and it can be seen that though the desired output is an array of -1.0 on the border of the volume and a value of -1.0 inside. The Ubuntu executable produces a defective output with 0.0 in between the volume in rectangular subvolumes.
Any help with this will be highly appreciated.