cudaHostRegister and cudaHostRegisterPortable, cudaHostRegisterDefault

I’m trying to understand the function [i]cudaHostRegister/i.

In the reference manual, cudaHostRegisterPortable can be used for the third parameter of [i]cudaHostRegister/i (see related discussion here). But, I can’t figure out how to get cudaHostRegisterPortable pointers to work. If I pass an aligned host pointer from [i]malloc/i, then register the pointer with cudaHostRegister(host_ptr, …, cudaHostRegisterPortable), that “works”, but cudaHostGetDevicePointer(…, host_ptr, 0) fails. If I try to pass the host pointer directly to kernel code after registering using cudaHostRegister(host_ptr, …, cudaHostRegisterPortable), that fails, as I would expect. If I first allocate a device pointer by cudaMalloc(&dev_ptr, …), then pass the device pointer to cudaHostRegister(dev_ptr, …, cudaHostRegisterPortable), that fails, as I would expect. An example using cudaHostRegisterMapped, based on the simpleZeroCopy.cu in the CUDA GPU Toolkit examples, works fine, but that offers no help in understanding how cudaHostRegisterPortable pointers are allocated and used. And, the previous discussion does not show an example, nor can I find any examples online or in the GPU Toolkit samples.

The manual says: “cudaHostRegisterPortable: The memory returned by this call will be considered as pinned memory by all CUDA contexts, not just the one that performed the allocation.” This doesn’t quite make sense because there are no call-by-reference parameters to the function. Further, the symbol cudaHostRegisterDefault is defined in the CUDA header file driver_types.h, but how it is used is not documented for [i]cudaHostRegister/i. How does it work?

Ken

I’m trying to understand the function [i]cudaHostRegister/i.

In the reference manual, cudaHostRegisterPortable can be used for the third parameter of [i]cudaHostRegister/i (see related discussion here). But, I can’t figure out how to get cudaHostRegisterPortable pointers to work. If I pass an aligned host pointer from [i]malloc/i, then register the pointer with cudaHostRegister(host_ptr, …, cudaHostRegisterPortable), that “works”, but cudaHostGetDevicePointer(…, host_ptr, 0) fails. If I try to pass the host pointer directly to kernel code after registering using cudaHostRegister(host_ptr, …, cudaHostRegisterPortable), that fails, as I would expect. If I first allocate a device pointer by cudaMalloc(&dev_ptr, …), then pass the device pointer to cudaHostRegister(dev_ptr, …, cudaHostRegisterPortable), that fails, as I would expect. An example using cudaHostRegisterMapped, based on the simpleZeroCopy.cu in the CUDA GPU Toolkit examples, works fine, but that offers no help in understanding how cudaHostRegisterPortable pointers are allocated and used. And, the previous discussion does not show an example, nor can I find any examples online or in the GPU Toolkit samples.

The manual says: “cudaHostRegisterPortable: The memory returned by this call will be considered as pinned memory by all CUDA contexts, not just the one that performed the allocation.” This doesn’t quite make sense because there are no call-by-reference parameters to the function. Further, the symbol cudaHostRegisterDefault is defined in the CUDA header file driver_types.h, but how it is used is not documented for [i]cudaHostRegister/i. How does it work?

Ken

What happens if you use [font=“Courier New”]cudaHostRegisterMapped|cudaHostRegisterPortable[/font] for flags? You need [font=“Courier New”]cudaHostRegisterMapped[/font] in order to be able to map the memory into the device address space.

What happens if you use [font=“Courier New”]cudaHostRegisterMapped|cudaHostRegisterPortable[/font] for flags? You need [font=“Courier New”]cudaHostRegisterMapped[/font] in order to be able to map the memory into the device address space.

OK, thanks.

cudaHostRegister(…, …, cudaHostRegisterMapped|cudaHostRegisterPortable) seems to work. I will have to test this further. It would be nice to say in the doc that cudaHostRegisterPortable must also be used with cudaHostRegisterMapped.

What does cudaHostRegisterDefault do? Why is it even defined?

Ken

OK, thanks.

cudaHostRegister(…, …, cudaHostRegisterMapped|cudaHostRegisterPortable) seems to work. I will have to test this further. It would be nice to say in the doc that cudaHostRegisterPortable must also be used with cudaHostRegisterMapped.

What does cudaHostRegisterDefault do? Why is it even defined?

Ken

is this in a non-UVA environment?

is this in a non-UVA environment?