is it possible to overlap computation with a device-to-device memcopy?

mroberts · January 6, 2010, 12:25am

Hello fellow CUDA programmers,

I have a quick question for anyone who is currently overlapping computation and memcopies. Is it possible to overlap the execution of a kernel with a device-to-device memcopy? The programming guide makes it clear that this IS possible for device-to-host memcopies, as well as host-to-device memcopies.

However it is not clear to me what will happen if you I try to pass the cudaMemcpyDeviceToDevice constant to the memcopy call being overlapped.

The reason I care about this is because I want to copy a large amount of data from global memory into a 3D texture and I have plenty of computations that do not depend on the resulting 3D texture. If I could overlap the memcopy with these computations I would certainly see a performance benefit.

Has anyone tried this? :)

Cheers,
Mike

mroberts · January 6, 2010, 12:43am

Actually never mind. The programming guide is quite clear that this is not possible.

Sarnath · January 6, 2010, 6:30am

Overlapping Kernel Execution and Memory copies is “card” dependent. Examine the device property to find out.
Check *async calls, pinned memory usage et al to look @ overlapping CPU with memory copies