Registering Virtually Contiguous Memchunks Larger than 4MB Results in Invalid Parameter Error

Hi,

My task is to allocate BIG memory chunk of Linux Kernel memory inside my proprietary PCIe device, map it to user space with mmap/mmap64 and then register it to Cuda with cudaHostRegister as virtually contiguous memory block.

Changing cudaHostRegisterPortable to cudaHostRegisterIoMemory resolves my previously reported problem (for more information please refer to https://devtalk.nvidia.com/default/topic/1014391/registering-mapped-linux-character-device-memory-with-cudahostregister-results-in-invalid-argument/).
However, I observe the following behavior:

If I allocate Linux Kernel memory chunks with size <= 4MB, mapping them to a User Space (using mmap/mmap64) and finally registering them to Cuda device with cudaHostRegister works just fine, however, if I allocate chunks larger than 4MB cudaHostRegister results with the same error “Invalid Argument”. More than that, if I map & register a memory chunk with size larger than 4MB but change the size parameter in cudaHostRegister to <= 4MB it still works just fine.

CUDA Runtime API claims the following:

For best of my knowledge, physically contiguous memory chunks in Linux kernel may be up to 4MB, however, due to remapping these chunks contiguous virtual memory (provided to user space) is available, so from cudaHostRegister perspective this memory is contiguous.
More than that, in order to eliminate to “non contiguous” memory issue I’ve changed the Kernel MAX_ORDER parameter to 9 which made 4MB memory chunks to be physically non contiguous (made up from two pfns) and remapped to virtually contiguous user space perspective memory array and STILL the cudaHostRegister did not result with “Invalid Argument” error and “did the job” just fine.

From the above I suspect cudaHostRegister just in some “hardcoded” way returns such an error, but I hope you’ll help more with that.
Attached the strace logs that I believe might help.
As may be seen from these log files the difference is after ioctl(fd, 0xc0304627, arg_ptr) command where fd is file descriptor for dev/nvidia0 file. After that in FourMBFrame.log successful flow the next command is ioctl(ffd, 0x21, argg_ptr) while “failure” path - FiveMBFrame.log reports a “Invalid Parameter” error.

Please advise.

Thanks,
Yoel.
FiveMBFrame.log (48 KB)
FourMBFrame.log (47.1 KB)