memory copy overlap

mahy · March 3, 2008, 1:13am

Hi everyone,

Does CUDA allow us to overlap memory copy from host to device and host code? To clarify, can we transfer a large amount of data to GPU while executing code either on CPU or GPU?

Thanks,

MisterAnderson42 · March 3, 2008, 6:09am

You can overlap memory transfers with operations on the CPU by using the Async versions of the memcpy calls. You can overlap a memory copy with a kernel execution by using the streams API (see the programming guide), but only compute 1.1 hardware (G92 and newer) can do the overlap. Compute 1.0 hardware will serialize the operations.

apaehler · March 8, 2008, 2:06pm

Is this correct?

“compute 1.1 hardware (G92 and newer)”

I have an 8600GTS - it says major=1 minor=1 and it is G84. Tried both Async and Streams on it and they both work.

DenisR · March 8, 2008, 6:54pm

I think the G84 came out later than the G92.

AndreiB · March 9, 2008, 9:00am

Nope, G84 is here for a year or so. I belive it’s just a typo in doc.

chris22 · March 28, 2008, 4:37pm

Where in the documentation does it say that compute hardware 1.1 is necessary to overlap asynchronous host to device memory transfers and kernel execution?

MisterAnderson42 · March 28, 2008, 4:55pm

Section 4.5.1.5 says:

This post from NVIDIA mentions that only compute 1.1 devices have this capability: http://forums.nvidia.com/index.php?showtop…ndpost&p=292323

nwilt · March 29, 2008, 12:14pm

CPU/GPU concurrency via cuMemcpy*Async is an artifact of the GPU and CPU being separate devices, so is available on all CUDA-capable hardware.

The “async memcpy” capability that is available only on compute 1.1 devices, is the ability to overlap host<->device memcpy with kernel execution. This is a separate level of concurrency but required very similar synchronization primitives, so the same APIs are used to access the functionality. (In any case, the expectation was that anyone who wanted memcpy/kernel concurrency also would want the API calls to be asynchronous, i.e. CPU/GPU concurrency.)

Topic		Replies	Views
Asynchronous data transfer CUDA Programming and Performance	8	7085	May 15, 2008
Overlap Device2Host and Host2Device memcpy? How can we overlap two cudaMemcpy calls? CUDA Programming and Performance	4	4482	June 4, 2008
Overlapping kernel execution and memory copy CUDA Programming and Performance	6	9744	September 22, 2007
Concurrent exec. of kernel and GPU mem copies CUDA Programming and Performance	5	2892	March 7, 2008
Asynchronous memory copy from Host to Device CUDA Programming and Performance	5	3063	June 12, 2008
Overlapping data transfers with kernel execution CUDA Programming and Performance	9	4558	March 13, 2009
Memory copy/set async to kernel execution in different stream CUDA Programming and Performance	5	1096	December 15, 2022
why cuda API does't support memory copy parallel with GPU computing CUDA Programming and Performance	8	5926	December 11, 2008
is it possible to overlap computation with a device-to-device memcopy? CUDA Programming and Performance	2	1045	January 6, 2010
cudaMemcpyAsync Question Overlap HostToDevice and DeviceToHost trasfers CUDA Programming and Performance	2	5635	April 2, 2009

memory copy overlap

Related topics