Concurrent bandwidth with multiple GPUs


I’m not getting full bandwidth for 2 GTX 480 system. I ran the concurrent bandwidth test from, and my own bandwidth test too, and these are the results I’m getting:

While the H-to-D bandwidth for 2 GPUs is x1.4 of the bandwidth for 1 GPU. The D-to-H bandwidth is essentially the same. WHY?


Intel i7 930 Quad with a X58 IOH (supports 2x16PCIe2), 6GB RAM. 2xGTX480 cards. CUDA 4.1 RC2 on CentOS 6 Linux.


The maximum unidirectional bandwidth for the QPI link between your CPU and the X58 chipset is about 9.6 GB/sec. The HtoD case is therefore coming in at 85% of maximum, indicating some kind of driver overhead I think. I don’t understand why DtoH shows no improvement, however. That seems very strange.

Thanks for the answer. Can you tell me how you got to 9.6 GB/sec? On the wiki page for X58 it says the max bandwidth is 12.8 GB/sec (

I think that bandwidth requires 6.4 megatransfers/second, and the QPI link on the CPU model you listed specifies that it supports 4.8 megatransfers/second.

Thanks. After looking up QPI at Wiki it became clear :)
Still, the Device-to-Host bandwidth is a mystery to me.