Low host<->device bandwidth for one of two cards

Hi,

I have a server with two Tesla M2050 cards that seems to have some problems with one of the cards. This is what the bandwidthTest program from the SDK samples reports:

Device 0: Tesla M2050

 Quick Mode

Host to Device Bandwidth, 1 Device(s), Paged memory

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			2829.0

Device to Host Bandwidth, 1 Device(s), Paged memory

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			2225.8

Device to Device Bandwidth, 1 Device(s)

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			85649.8

Device 1: Tesla M2050

 Quick Mode

Host to Device Bandwidth, 1 Device(s), Paged memory

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			386.6

Device to Host Bandwidth, 1 Device(s), Paged memory

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			396.9

Device to Device Bandwidth, 1 Device(s)

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			85610.2

As you can see the bandwidth for the second card is significantly lower than the first card. I’ve tried running nvidia-settings to see which PCIe interfaces they are connected to, but have not managed to do so since the server doesn’t run an X server (and the screen is connected to a Matrox card also). They should both be connected to 16x PCIe 2.0 interfaces though, since the server uses the Intel 5500/5520 chipset. The server is running ArchLinux 64-bit with the 260.19.29 NVIDIA driver and CUDA 3.2.16. Both the GPUs are set to exclusive mode.

If anyone can offer any help in this matter I would be very grateful. The server is used for research, and consistent results are quite important.

Have you tried to swap the two cards to see if the problem is in the system or the cards itself ? Also what do you mean with the two GPUs are set to exclusive mode?