I’m getting some pretty slow device-to-device memory bandwidth running on Fermi based hardware.
Below are the results of bandwidthTest.
Summary: GTX 280 is reported as having faster device-to-device memory transfers than either a GTX 460 or a GTX 480. That can’t be right. The GTX 460 (1GB version) is roughly half of the max theoretical, GTX 480 is almost 60GB shy of max theoretical, while GTX 280 almost achieves max theoretical.
Driver details:
GTX 480 is using 256.44 release driver
GTX 460 didn’t seem to work with 256.44, so I installed 256.40 development driver
GTX 280 is using 195.36.15 development driver
I also noticed the somewhat slower PCI bandwidth on the GTX 460, but that’s not a concern at the moment, the slow device memory is.
Device 0: GeForce GTX 460
Quick Mode
Host to Device Bandwidth, 1 Device(s), Paged memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 4081.2
Device to Host Bandwidth, 1 Device(s), Paged memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 4657.4
Device to Device Bandwidth, 1 Device(s)
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 59033.4
Device 0: GeForce GTX 480
Quick Mode
Host to Device Bandwidth, 1 Device(s), Paged memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 5284.3
Device to Host Bandwidth, 1 Device(s), Paged memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 5001.9
Device to Device Bandwidth, 1 Device(s)
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 118115.3
Device 0: GeForce GTX 280
Quick Mode
Host to Device Bandwidth, 1 Device(s), Paged memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 5290.2
Device to Host Bandwidth, 1 Device(s), Paged memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 4224.9
Device to Device Bandwidth, 1 Device(s)
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 122501.0