Find out if host buffer is pinned (page-locked)

sramxtr · March 4, 2015, 5:31pm

A short description of my problem is as follows:

I developed a function that calls a CUDA kernel. My function receives a pointer to the host data buffers (input and output of kernel), and has no control over the allocation of these buffers.

→ It is possible that the host data was allocated with either of malloc or cudaHostAlloc. My function is not specifically told which allocation method was used.

The question is: what is a feasible way for my function to figure out whether the host buffers are pinned/page-locked (cudaHostAlloc) or not (regular malloc)?

The reason I am asking is that if they are not page-locked, I would like to use cudaHostRegister() to make them (the buffers) so, which renders them amenable for streams.

I have tried three ways which have failed:
1- Always apply cudaHostRegister(): this way breaks if the host buffers are already pinned
2- Run cudaPointerGetAttributes(), and if the return error is cudaSuccess, then the buffers are already pinned, nothing to do; else if cudaErrorInvalidValue, apply cudaHostRegister : for some reason this way results in the kernel execution returning an error
3- Run cudaHostGetFlags(), and if return is not a success, then apply cudaHostRegister : same behavior as 2-.

In the case of 2- and 3-, the error is “invalid argumentn”

Note that my code currently is not using streams, rather always calls cudaMemcpy() for the entire host buffers. If I do not use any of the three above ways, my code runs to completion, regardless of whether the host buffer is pinned or not.

Any advice? Many thanks in advance.

njuffa · March 4, 2015, 6:13pm

I am not quite sure but I seem to recall that cudaMemcpyAsync() automatically falls back to cudaMemcpy() if the host buffer is not pinned. If confirmed, wouldn’t this provide sufficient robustness for code that uses streams in conjunction with cudaMemcpyAsync()?

sramxtr · March 4, 2015, 7:07pm

In principle, as far as I can tell, yes – cudaMemcpuAsync() would provide enough robustness. Will give it a shot; thanks!

One caveat is that without streams, I am getting a performance boost out of cudaHostRegister() (~1.5X speedup). In essence, it seems like using cudaHostRegister() does provide some benefits.

Note that reverting to method 3- above, the program runs to completion, and I get the expected results back. To do so, I am simply not checking the return value of cudaGetLastError() after the kernel call. Any advice on this matter too would help.

Thanks again.

Robert_Crovella · March 4, 2015, 7:41pm

transfers with pinned memory are often faster than with non-pinned memory. counteracting this, the registration time usually outweighs any time benefit from a single transfer. But if the buffer is used repeatedly, registering it will give a net improvement in overall execution time for these calls (alloc/registration + cudaMemcpy).

this question was cross-posted here:
[url]c++ - CUDA: Find out if host buffer is pinned (page-locked) - Stack Overflow
and an explanation is given there for the error behavior.

njuffa · March 4, 2015, 9:37pm

Looking at just the time for the transfers themselves, shouldn’t transfers from pinned memory always be faster?

With pinned memory, the GPU’s DMA engine can pull the data straight from the host’s system memory based on the known physical address, for as large a block size as has been pinned. Without pinning the host buffer, the contents of the host memory is first copied (at system memory transfer rates) to a pinned buffer of limited size allocated by the driver, and DMAed from there. In addition to the additional copy operation in system memory, large transfers may be broken into multiple smaller ones to account for the limited buffer size.

Topic		Replies	Views
cudaHostRegister/Unregister vs Host Memcpy to Pagelocked CUDA Programming and Performance	3	2682	November 26, 2012
cudaHostRegister returns cudaErrorInvalidValue CUDA Programming and Performance	14	3025	January 28, 2021
transfer from pageable host memory to page-locked host memory? CUDA Programming and Performance	3	1162	June 1, 2012
cudaHostRegister alternative for already pinned memory CUDA Programming and Performance cuda	0	394	September 5, 2022
Problems with cudaHostAlloc and cudaMemcpyAsync CUDA Programming and Performance	5	4631	February 8, 2010
cudaMemcpy to non-pinned memory CUDA Programming and Performance	5	1881	October 12, 2021
Is it possible to use pinned memory? Outside of CUDA CUDA Programming and Performance	14	6618	January 22, 2025
Can I create a pinned memory buffer to support overlapping compute/copy without cudaMallocHost overhead CUDA Programming and Performance cuda	13	1055	November 3, 2020
Already locked memory: making it known to CUDA - how ? CUDA Programming and Performance	2	907	September 6, 2016
cudaHostRegister breaks data CUDA Programming and Performance	0	476	April 8, 2019

Find out if host buffer is pinned (page-locked)

Related topics