Pinning Memory and DMA (and other i/o speed up?)

Hello -

I had posted in a previous thread some I/O issues I was having with the Tesla
card that I had. (I fixed most of those so I felt it was better to start a new thread
than bump an old one). In there, amongst other things, I discussed the transfer
time back and forth as being a bottleneck for my program (as it is any program obviously),
and someone suggested pinning my memory and having the card DMA it.

So - I wanted to ask, to speed up some of the memcpys (since I imagine the cudaMallocs
are not as costly and nothing really can be done to speed them up anyway),
what can I do? Aside from sending smaller data obviously, is pinning and DMA a good way to speed up?

and more importantly - I wouldn’t even know where to start on that as I have no experience
with setting up any form of DMA. Any help would be appreciated.

Thanks!

Yup… pinning memory will certainly improve perf, but it doesn’t come for free! ;)
If you are pinning a certain region in host memory, this means that the total amount of ‘page-able’ memory in the host gets reduced. If this reduction is significant, you could be severely affecting the overall performance of the system!
You can refer cuda programming guide version 2.2, chap 3.2.5 for more details on this technique.

excellent, thanks!