Slow Paged Memory Transfer with M2090

I have a problem where paged memory transfers with the Tesla M2090 are very slow:

[font=“Courier New”]Device 0: Tesla M2090
Quick Mode

Host to Device Bandwidth, 1 Device(s), Paged memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1832.4

Device to Host Bandwidth, 1 Device(s), Paged memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1597.6
[/font]

But with pinned memory:

[font=“Courier New”]Device 0: Tesla M2090
Quick Mode

Host to Device Bandwidth, 1 Device(s), Pinned memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 5735.8

Device to Host Bandwidth, 1 Device(s), Pinned memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 5534.9

[/font]
Any ideas what could be causing this or if there is any way to speed up paged memory? I’m running Centos 6.2 and the NVidia driver 295.49. I have 36GB of RAM and 2 CPUs (Xeon E5630). The system is a Supermicro 1026GT-TF-FM209:
http://www.supermicr…F.cfm?GPU=FM209

The odd thing is that even the the M2090 is “top of the line”, the paged memory transfers are slower than EVERY other GPU board I have (Tesla C1060, GTX 280M, GTX 460, GTX 480, GTX 580). This is frustrating because we bought 6 systems, each with 2 M2090s, and all seem to suffer from the same slow paged memory transfers.

Thanks!

Hi,

It might be due to some unwanted NUMA effect. Try playing around with numactl. For example, on my own machine with 2 M2090, here is what I get:

$ numactl -m 0 bandwidthTest 

[bandwidthTest] starting...

Running on...

Device 0: Tesla M2090

 Quick Mode

Host to Device Bandwidth, 1 Device(s), Paged memory

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			4281.1

Device to Host Bandwidth, 1 Device(s), Paged memory

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			3482.0

Device to Device Bandwidth, 1 Device(s)

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			120670.9

$ numactl -m 1 bandwidthTest 

[bandwidthTest] starting...

Running on...

Device 0: Tesla M2090

 Quick Mode

Host to Device Bandwidth, 1 Device(s), Paged memory

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			2891.6

Device to Host Bandwidth, 1 Device(s), Paged memory

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			2402.2

Device to Device Bandwidth, 1 Device(s)

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			120476.1

$ numactl -i all bandwidthTest 

[bandwidthTest] starting...

Running on...

Device 0: Tesla M2090

 Quick Mode

Host to Device Bandwidth, 1 Device(s), Paged memory

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			3468.4

Device to Host Bandwidth, 1 Device(s), Paged memory

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			2828.7

Device to Device Bandwidth, 1 Device(s)

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			120663.6

My worst case scenario is still way better than yours but that is still a possibility…

Thanks for the info. I didn’t know about [font=“Courier New”]numactl[/font] or the issues there. I was able to speed up the transfer by trying various things, but I still don’t achieve the speeds you’re seeing. May I ask what motherboard/CPU you’re using?

Thanks again!

The nodes are supermicro twins with Intel Xeon X5560 connected to a NextIO vCORE Extreme cabinet, exposing two M2090 per node.
No adds whatsoever, but that works damn well.