Page Locked Memory

tdhd · May 5, 2011, 2:41pm

Hello,

can someone of you explain to me where the difference between

[*]page-locked Memory that is copied using cudaMempy() and

[*]zero copy page-locked memory

is?

I do understand that the first uses only a DMA to transfer the memory without a “stage” buffer which is faster than ordinary cudaMalloc() paired with cudaMemcpy().

And zero copy memory is accessed via a device pointer which does not invoke a copy at all.

I have only tried the second variant now, but it seems like 3 times slower in all my calculations at least when doing caculations from zero copy page-locked memory.

Is the slowdown caused by going over the PCIe Bus every time for zero copy memory? PCIe speed is about 16 GB/s and peak memory bandwidth, depending on the card however about 85 GB/s.

Thanks and best regards, tdhd

tdhd · May 5, 2011, 7:16pm

In case anyone wondered, the answer to the slowdown is that page pinned memory is not cached on the GPU.
So multiple read/writes are slower in the kernels.

tmurray · May 5, 2011, 7:40pm

You’re combining two concepts: pinned memory and zero-copy access to pinned memory. Pinned memory is faster for CPU-GPU copies using cudaMemcpy because the GPU can DMA to/from the memory directly, whereas pageable memory implies a CPU-side memcpy (which is why you can’t do async memcpys with pageable memory). Zero-copy memory is using pinned memory directly from a kernel, which is slower in terms of latency and bandwidth than accesses from GPU memory, but since you don’t have to do a memcpy to/from the memory after the kernel, it may make your overall application faster if it’s only being used as a one-time input or output buffer.

tdhd · May 5, 2011, 8:08pm

Thanks for clarifying that for me.

Topic		Replies	Views
Zero Copy VS Page-Locked CUDA Programming and Performance	5	1116	September 19, 2011
question about page locked memory CUDA Programming and Performance	2	8555	April 21, 2009
Why can page-locked Memory be acc in memcpy funciton CUDA Programming and Performance	1	3526	April 6, 2009
Memory-type quesions CUDA Programming and Performance	7	417	April 21, 2023
Kernel Copy vs. cudaMemcpy CUDA Programming and Performance	1	5307	January 19, 2014
transfer from pageable host memory to page-locked host memory? CUDA Programming and Performance	3	1047	June 1, 2012
zero-copy pinned memory and cuda 4.0 CUDA Programming and Performance	1	3955	January 25, 2012
Page Locked Memory CUDA Programming and Performance	5	4022	October 18, 2009
Pinned Memory zero copy No-copy pinning of system memory CUDA Programming and Performance	3	1091	December 1, 2011
page-locked memory CUDA Programming and Performance	4	11313	November 10, 2008

Page Locked Memory

Related topics