Make large memory-mapped IO accessible to the GPU

Hi, I want to use memory-mapped IO to map a large file (>100GB) and then make it accessible to the GPU. However, I’m not sure how to do that. I’ve already tried a couple of things.

cudaHostRegister:
I tried to use cudaHostRegister with cudaHostRegisterMapped and cudaHostRegisterIoMemory. However, cudaHostRegisterMapped only works for small allocations, and cudaHostRegisterIoMemory always returns an invalid argument error regardless of the allocation size. I think cudaHostRegisterIoMemory could be a solution, but I don’t understand why using the flag always results in an invalid argument error.

GPUDirect Storage: I also considered using direct storage but didn’t find a way to use it to perform memory-mapped IO.

HMM: I think HMM would work for what I’m planning to do, but our GPUs don’t support it, so I would prefer another method.

Thank you.