On my current system (AMD Opteron 250, 8800GTX, SUSE 10.2, CUDA .9) I’ve noticed that the Device to Device memory transfers as reported by the bandwidthTest sample program are only around 48-49 GB/sec. The FAQ and I think one of mfatica’s or mark’s post suggested that it should be around 70 (as one would think given the memory bandwidth of the card). It seems that this bandwidth (unlike H2D and D2H) should be independent of the system it is running on.
Any ideas as to what could be causing this bandwidth to be below expected? Is this just what it is for everyone with v.9 right now?