D2D tranfers slow? D2D slower than reported in FAQ

On my current system (AMD Opteron 250, 8800GTX, SUSE 10.2, CUDA .9) I’ve noticed that the Device to Device memory transfers as reported by the bandwidthTest sample program are only around 48-49 GB/sec. The FAQ and I think one of mfatica’s or mark’s post suggested that it should be around 70 (as one would think given the memory bandwidth of the card). It seems that this bandwidth (unlike H2D and D2H) should be independent of the system it is running on.

Any ideas as to what could be causing this bandwidth to be below expected? Is this just what it is for everyone with v.9 right now?

32 or 64 bit?
Which version of the driver?

I just checked on one machine and under Windows and Linux, we got around 71 GB/sec with a 8800 GTX

As I suspected there is a spurious multiplication of bandwidth by 2 for DD test (in 0.9 SDK):

   bandwidthInMBs = 2.0f * (1e3 * memSize * (float)MEMCOPY_ITERATIONS) /

                                        (elapsedTimeInMs * (float)(1 << 20));

Just wanted to clear the air on this one.


64 bit and the recommended driver for v.9 100.14.10. I’m a bit baffled as to how the D2D could vary from machine to machine; I would expect it to depend only on the card.

What size transfer are you using? The default for the quick test is ~33MB I think, making it larger doesn’t seem to have any effect.

D2D should be only dependent on the card.

It may be a performance bug in the new 64bit driver. We will check.

The bandwidth test is measuring the total bandwidth seen at the memory interface level of the GPU. A device to device memory copy involves one read from and one write to the memory interface. The copy bandwidth, measuring bytes moved, will be half the total bandwidth.

It seems so to me. I have a dual-boot system: 32-bit winXP and 64-bit linux. Using the CUDA 0.9 drivers, I get ~70GB/s on windows for the 67186688 mem size test and only 46932 MB/s under 64-bit linux (driver version 100.14.10).

Yes, it was a performance bug, it has been fixed.
This is why we did a pre-release…