Hello -
I had posted in a previous thread some I/O issues I was having with the Tesla
card that I had. (I fixed most of those so I felt it was better to start a new thread
than bump an old one). In there, amongst other things, I discussed the transfer
time back and forth as being a bottleneck for my program (as it is any program obviously),
and someone suggested pinning my memory and having the card DMA it.
So - I wanted to ask, to speed up some of the memcpys (since I imagine the cudaMallocs
are not as costly and nothing really can be done to speed them up anyway),
what can I do? Aside from sending smaller data obviously, is pinning and DMA a good way to speed up?
and more importantly - I wouldn’t even know where to start on that as I have no experience
with setting up any form of DMA. Any help would be appreciated.
Thanks!