bi-directional cudamemcpy on Fermi

BlahCuda · May 27, 2010, 11:46pm

Anyone have speedup numbers on bi-directional cudaMemcpyAsync (i.e. concurrent h->d and d->h) on Fermis?

tmurray · May 28, 2010, 12:03am

Bidirectional memcpys are only supported on GF100 Teslas.

RoofTopG · October 24, 2011, 12:53pm

Let’s say that we have a Tesla GPU with 2 DMA engines connected over a PCIe 2.0 bus.

What’s approximately the max. memcpy bandwidth that could be achieved per direction?

a) single directional transfers, no overlaps, e.g. only memcpyH2Ds

b) bi-directional and fully overlapping transfers, e.g. concurrent memcpyH2Ds and memcpyD2Hs