Overlapping CPU<->GPU trasnfer and kernel computation only for pinned memory

HannesF99 · March 28, 2011, 7:53am

Is there a technical reason why the overlapping of CPU <-> GPU transfer and kernel computation works only for buffers which were allocated as ‘pinned’ (non-pagabele) CPU memory ? The problem with this is, I often have some buffer which were allocated in some third-party library as normal page-able memory.

hyqneuron · March 28, 2011, 2:59pm

I also want to know the reason. If they could make normal memory copy from device to host without non-pageable memory, why couldn’t they do so with overlapping copy as well? A problem on the PCI bus?

tmurray · March 28, 2011, 7:07pm

paged memcpys are staged through a pinned buffer using CPU-side memcpys, whereas pinned memcpys are performed only via DMAs. so in theory we could do it, if we have a background thread doing CPU side memcpys and synchronizing with the GPU, etc.

(really you should try using cudaHostRegister in 4.0, this is why it’s there)

HannesF99 · March 29, 2011, 9:12am

@tmurray → Thx for the tip. Seems that ‘cudaHostRegister’ is what I need. Nice …
http://developer.download.nvidia.com/compute/cuda/4_0/CUDA_Toolkit_4.0_Overview.pdf (page 6).

Topic		Replies	Views
memory copy overlap CUDA Programming and Performance	7	14731	March 29, 2008
Does cudaMemcpyAsync require host memory to be pinned? CUDA Programming and Performance cuda	1	407	October 6, 2022
Overlap cudaMemcpyAsync with CPU execution CUDA Programming and Performance	2	1132	April 3, 2009
Overlapping kernel execution and memory copy CUDA Programming and Performance	6	9746	September 22, 2007
Does unified memory incur double transfer? CUDA Programming and Performance cuda	2	347	April 6, 2022
Cpu-to-gpu data transfer query CUDA Programming and Performance	3	372	May 19, 2024
Overlapping computation and data transfers must use pinned memory or UVA? CUDA Programming and Performance	1	608	August 13, 2018
Highly varying copy throughput from/to pinned to/from pageable memory CUDA Programming and Performance cuda	9	1238	July 10, 2020
Data transfers are not overlapping CUDA Programming and Performance	2	642	February 7, 2018
Is there any kind of Host <-> Device concurrency CUDA Programming and Performance	6	3147	August 22, 2007

Overlapping CPU<->GPU trasnfer and kernel computation only for pinned memory

Related topics