Pinned Memory zero copy No-copy pinning of system memory

thanasio · November 30, 2011, 12:23pm

Hi again,

In Cuda 4.0 description there is a “No-copy pinning of system memory, a faster alternative to cudaMallocHost()”…
Is there any difference or improvement there compared with previous versions?
I can barely find any info on that…

Cheers,
Than

tera · November 30, 2011, 5:52pm

The improvement is that you can use it on memory that wasn’t allocated by yourself. It’s not faster than cudaMallocHost(), but it is faster than cudaMallocHost() plus copying all data into the newly allocated memory.

thanasio · December 1, 2011, 9:52am

Hmmm that all sounds reasonable. Have you tried it? I have tried it and apparently it slowed down the application 4x. From what i have read this technique only performs
on integrated GPU cards. If the card is not integrated then it does not improve performance and if access in memory is not sequential then performance is seriously affected.

However, i have a feeling i might be missing something in here, has anybody else tried it?

Best,
Than

tera · December 1, 2011, 12:00pm

“No-copy pinning of system memory” of course does not avoid copying the data to the GPU. It only avoids copying data on the CPU a second time if it happens to be in unpinned memory.

cudaHostRegister() should not slow down your application though. malloc()+cudaHostRegister() should be faster than cudaHostAlloc()+memcpy().
However I suspect you are comparing runtimes for zero-copy (data is processed directly from CPU memory without copying to GPU memory first (but still has to go through PCIe first except on integrated GPUs)) vs. transferring the data to GPU memory via cudaMemcpy() first. Zero-copy can be slower because of the high latency of the PCIe bus and because data may be transferred through PCIe multiple times.

Topic		Replies	Views
"What is the best practice for CUDA data transfer (CUDA 12.5)?" CUDA Programming and Performance	4	123	September 2, 2024
zero copy using cudaHostAlloc vs normal malloc+cudaMalloc CUDA Programming and Performance	5	4931	May 2, 2012
zero-copy pinned memory and cuda 4.0 CUDA Programming and Performance	1	3956	January 25, 2012
malloc() + cuMemHostRegister() faster than cuMemAllocHost() CUDA Programming and Performance	0	1080	October 9, 2013
CPU operation is very slow on memory allocated by cudaMallocHost CUDA Programming and Performance	0	380	October 9, 2018
Pinned Memory slower than pageable memory CUDA Programming and Performance	4	3166	September 16, 2010
Is it possible to use pinned memory? Outside of CUDA CUDA Programming and Performance	14	6281	January 22, 2025
Advantages/Disadvantages of using pinned memory CUDA Programming and Performance	6	13599	May 4, 2018
Page Locked Memory CUDA Programming and Performance	3	984	May 5, 2011
CPU operation is very slow on memory allocated by cudaMallocHost TensorRT	1	827	October 8, 2018

Pinned Memory zero copy No-copy pinning of system memory

Related topics