Why my pinned memory is slow

I have a Dual GTX590 SLI (4GPU inside), however my deviceTest is very strange:

bandwidthTest --memory=pageable --device=1

[bandwidthTest] starting...

C/bin/linux/release/bandwidthTest Starting...

Running on...

Device 1: GeForce GTX 590

 Quick Mode

Host to Device Bandwidth, 1 Device(s), Paged memory

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			3077.7

Device to Host Bandwidth, 1 Device(s), Paged memory

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			3107.7

Device to Device Bandwidth, 1 Device(s)

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			132716.7
bandwidthTest --memory=pinned --device=1

[bandwidthTest] starting...

C/bin/linux/release/bandwidthTest Starting...

Running on...

Device 1: GeForce GTX 590

 Quick Mode

Host to Device Bandwidth, 1 Device(s), Pinned memory

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			3168.0

Device to Host Bandwidth, 1 Device(s), Pinned memory

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			3198.8

Device to Device Bandwidth, 1 Device(s)

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			132642.7

[bandwidthTest] test results...

PASSED

> exiting in 3 seconds: 3...2...1...done!

host to device and device to host is the same for all GPUs when I tried, only device to device for device 0 is slower maybe because it is used for graphic. I have PCIe x16 v2.0

Does SLI make the thing slower ?

Thanks

What mainboard do you use?

product: Maximus IV Extreme
vendor: ASUSTeK Computer INC.

I attached the lshw and lspci as well
lspci.txt (76.6 KB)
lshw.txt (32.5 KB)

I tried to disable SLI and it still the same. Is there anything strange with my hardware ?

Into which slots did you install the cards and how did you configure the PCIe x16 lane switch?

Your mainboard has only one PCIe x16 slot with full bandwidth. With two cards you have the choice of either two PCIe x8 slots, or two PCIe x16 slots between an NF200 switch that allows the two cards to share one PCIe x16 port. The latter option would enable each of the cards the full bandwidth of a PCIe x16 port, but not both at the same time.

I’d guess that you have the two cards installed in the two PCIe x8 slots, which would explain your findings.

Thanks tera,
I did check in the BIOS as well as looked at the slot in the board where the 2 card is installed and It’s actually in two x16 position

So are the cards in slots 3 and 5 (PCIE_X16_2 and PCIE_X16_4)? If they are in slots 1 and 3 (or 1 and 5), you will only get x8 bandwidth for the first card.

Yes I see the manual as well, It’s in x16_2 and x16_4. I tried to remove _x2 and only run 1 on x4, still the same performance result.

Does the graphic output affect the performance result ? Because I have one of the two cards connected to my monitor.