Could someone helpme to achieve overlapping between computation and transfer in GTX Titan card?

Hi,
I can not effectively overlap execution and memory transfers using a GTX Titan card. Its really a shame, it seems any memory operation gets in-order (doesn’t have this card a GK110 chipset?).

Anyways, I’m asking for a good strategy in achieving the best overlapping on a typical scenario like:

stream1 -> memcpy_HtoD1; kernel_exec1; memcpy_DtoH2; …
stream2 -> memcpy_HtoD2; kernel_exec2; memcpy_DtoH2; …

I have tried many different approach for this but haven’t success in any of them.

link to stackoverflow question:
http://stackoverflow.com/questions/17564791/what-is-the-best-strategy-to-overlap-kernel-execution-and-data-transfers-in-a-gt

I think your Titan can only overlap memcopies with kernels.
Try using one stream for transfers and another for kernels?

Hello,

When I run the the deviceQuery example from the SDK for my Titan, I get this:

Concurrent copy and kernel execution:          Yes with 1 copy engine(s)

My guess is that this means you can not overlap 2 copying.

Since your CPU memory is already pinned, you could try implementing explicit copies in your kernels and stop using host initiated memcopies. This assumes your kernels can overlap.