Does cudaHostRegister/cudaHostUnregister serialise asynchronous code?

Hi there

I’m debugging some asynchronous code and I’m seeing cudaHostUnregister calls deferred till kernels have finished execution. I’m aiming to pin some memory, asynchronously transfer and unpin some memory while another kernel is executing.

Is it the case that cudaHostRegister/cudaHostUnregister cause asynchronous code to serialise (similar to cudaMalloc/cudaFree)?


That is to be expected. Anything that modifies the address map of the GPU is generally deferred until that GPU is not executing any kernels.

Pinning/unpinning are “expensive” tasks. It’s advised that if possible, you do these once, up-front in the beginning of your application, and reuse the allocations, rather than pinning/unpinning repetetively in a time-critical processing loop.