On my current system (AMD Opteron 250, 8800GTX, SUSE 10.2, CUDA .9) I’ve noticed that the Device to Device memory transfers as reported by the bandwidthTest sample program are only around 48-49 GB/sec. The FAQ and I think one of mfatica’s or mark’s post suggested that it should be around 70 (as one would think given the memory bandwidth of the card). It seems that this bandwidth (unlike H2D and D2H) should be independent of the system it is running on.
Any ideas as to what could be causing this bandwidth to be below expected? Is this just what it is for everyone with v.9 right now?
64 bit and the recommended driver for v.9 100.14.10. I’m a bit baffled as to how the D2D could vary from machine to machine; I would expect it to depend only on the card.
What size transfer are you using? The default for the quick test is ~33MB I think, making it larger doesn’t seem to have any effect.
The bandwidth test is measuring the total bandwidth seen at the memory interface level of the GPU. A device to device memory copy involves one read from and one write to the memory interface. The copy bandwidth, measuring bytes moved, will be half the total bandwidth.
It seems so to me. I have a dual-boot system: 32-bit winXP and 64-bit linux. Using the CUDA 0.9 drivers, I get ~70GB/s on windows for the 67186688 mem size test and only 46932 MB/s under 64-bit linux (driver version 100.14.10).