Consequences of not page-aligning buffers for cudaHostRegister()?

Earl(ier) CUDA versions had such a requirement documented:

http://horacio9573.no-ip.org/cuda/group__CUDART__MEMORY_g36b9fe28f547f28d23742e8c7cd18141.html#g36b9fe28f547f28d23742e8c7cd18141

→ The pointer ptr and size size must be aligned to the host page size (4 KB).

See also: Using cudaHostRegister() in CUDA 4.0 CUDA 4.0 - #16 by mfatica

→ FYI in CUDA 4.1 the restrictions on alignment and size are gone. You can host register a generic pointer.

See also cuda-samples/Samples/0_Introduction/simpleZeroCopy/simpleZeroCopy.cu at master · NVIDIA/cuda-samples · GitHub line 165:

// We need to ensure memory is aligned to 4K (so we will need to padd memory
// accordingly)

I think the worst, what will still happen, is that without aligning on a page-boundary a larger amount of memory than necessary is page-locked. (Up to one page more per call). So it is no problem, if you do not have too many such memory areas.

More importantly, when using vector<> and other dynamic data structures is that you do not have to register memory too often, as that would take more time than not using regular cudaMemcpy instead of cudaMemcpyAsync(). So reserve a size large enough, if you plan to change their contents.

1 Like