I have a Dual GTX590 SLI (4GPU inside), however my deviceTest is very strange:
bandwidthTest --memory=pageable --device=1
[bandwidthTest] starting...
C/bin/linux/release/bandwidthTest Starting...
Running on...
Device 1: GeForce GTX 590
Quick Mode
Host to Device Bandwidth, 1 Device(s), Paged memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 3077.7
Device to Host Bandwidth, 1 Device(s), Paged memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 3107.7
Device to Device Bandwidth, 1 Device(s)
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 132716.7
bandwidthTest --memory=pinned --device=1
[bandwidthTest] starting...
C/bin/linux/release/bandwidthTest Starting...
Running on...
Device 1: GeForce GTX 590
Quick Mode
Host to Device Bandwidth, 1 Device(s), Pinned memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 3168.0
Device to Host Bandwidth, 1 Device(s), Pinned memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 3198.8
Device to Device Bandwidth, 1 Device(s)
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 132642.7
[bandwidthTest] test results...
PASSED
> exiting in 3 seconds: 3...2...1...done!
host to device and device to host is the same for all GPUs when I tried, only device to device for device 0 is slower maybe because it is used for graphic. I have PCIe x16 v2.0
Into which slots did you install the cards and how did you configure the PCIe x16 lane switch?
Your mainboard has only one PCIe x16 slot with full bandwidth. With two cards you have the choice of either two PCIe x8 slots, or two PCIe x16 slots between an NF200 switch that allows the two cards to share one PCIe x16 port. The latter option would enable each of the cards the full bandwidth of a PCIe x16 port, but not both at the same time.
I’d guess that you have the two cards installed in the two PCIe x8 slots, which would explain your findings.
So are the cards in slots 3 and 5 (PCIE_X16_2 and PCIE_X16_4)? If they are in slots 1 and 3 (or 1 and 5), you will only get x8 bandwidth for the first card.