Slow Paged Memory Transfer with M2090

ekimd · May 18, 2012, 2:11am

I have a problem where paged memory transfers with the Tesla M2090 are very slow:

[font=“Courier New”]Device 0: Tesla M2090
Quick Mode

Host to Device Bandwidth, 1 Device(s), Paged memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1832.4

Device to Host Bandwidth, 1 Device(s), Paged memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1597.6
[/font]

But with pinned memory:

[font=“Courier New”]Device 0: Tesla M2090
Quick Mode

Host to Device Bandwidth, 1 Device(s), Pinned memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 5735.8

Device to Host Bandwidth, 1 Device(s), Pinned memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 5534.9

[/font]
Any ideas what could be causing this or if there is any way to speed up paged memory? I’m running Centos 6.2 and the NVidia driver 295.49. I have 36GB of RAM and 2 CPUs (Xeon E5630). The system is a Supermicro 1026GT-TF-FM209:
http://www.supermicr…F.cfm?GPU=FM209

The odd thing is that even the the M2090 is “top of the line”, the paged memory transfers are slower than EVERY other GPU board I have (Tesla C1060, GTX 280M, GTX 460, GTX 480, GTX 580). This is frustrating because we bought 6 systems, each with 2 M2090s, and all seem to suffer from the same slow paged memory transfers.

Thanks!

Gilles_C · May 18, 2012, 3:57am

Hi,

It might be due to some unwanted NUMA effect. Try playing around with numactl. For example, on my own machine with 2 M2090, here is what I get:

$ numactl -m 0 bandwidthTest 

[bandwidthTest] starting...

Running on...

Device 0: Tesla M2090

 Quick Mode

Host to Device Bandwidth, 1 Device(s), Paged memory

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			4281.1

Device to Host Bandwidth, 1 Device(s), Paged memory

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			3482.0

Device to Device Bandwidth, 1 Device(s)

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			120670.9

$ numactl -m 1 bandwidthTest 

[bandwidthTest] starting...

Running on...

Device 0: Tesla M2090

 Quick Mode

Host to Device Bandwidth, 1 Device(s), Paged memory

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			2891.6

Device to Host Bandwidth, 1 Device(s), Paged memory

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			2402.2

Device to Device Bandwidth, 1 Device(s)

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			120476.1

$ numactl -i all bandwidthTest 

[bandwidthTest] starting...

Running on...

Device 0: Tesla M2090

 Quick Mode

Host to Device Bandwidth, 1 Device(s), Paged memory

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			3468.4

Device to Host Bandwidth, 1 Device(s), Paged memory

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			2828.7

Device to Device Bandwidth, 1 Device(s)

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			120663.6

My worst case scenario is still way better than yours but that is still a possibility…

ekimd · May 18, 2012, 7:44pm

Thanks for the info. I didn’t know about [font=“Courier New”]numactl[/font] or the issues there. I was able to speed up the transfer by trying various things, but I still don’t achieve the speeds you’re seeing. May I ask what motherboard/CPU you’re using?

Thanks again!

Gilles_C · May 18, 2012, 8:02pm

The nodes are supermicro twins with Intel Xeon X5560 connected to a NextIO vCORE Extreme cabinet, exposing two M2090 per node.
No adds whatsoever, but that works damn well.

Topic		Replies	Views
Slow Paged Memory Transfer with M2090 CUDA Programming and Performance	0	1511	May 15, 2012
Tesla m2090 performing as exptected ? CUDA Programming and Performance	2	1093	June 2, 2012
GPU Utilization Drops after Consecutive Executions CUDA Programming and Performance	28	5817	October 2, 2013
bandwidthTest anomaly! CUDA Programming and Performance	4	10886	July 31, 2009
bandwidthtest: pageable vs pinned memory CUDA Programming and Performance	4	1682	February 18, 2010
Memory bandwidth too high? CUDA Programming and Performance	0	3565	December 4, 2007
Accelerate Host <-> Device Memory Transfer Besides CudaMallocHost CUDA Programming and Performance	7	5219	March 4, 2009
Abnormally Low Device To Host Memory Bandwidth CUDA Programming and Performance	4	8355	August 4, 2009
page-locked memory: alignment? reason: inconsistent results for memcopy CUDA Programming and Performance	4	8693	March 18, 2008
Bandwidth problem ? Could anyone verify that this is normal? CUDA Programming and Performance	7	3606	April 25, 2008

Slow Paged Memory Transfer with M2090

Related topics