Hi, I want to use memory-mapped IO to map a large file (>100GB) and then make it accessible to the GPU. However, I’m not sure how to do that. I’ve already tried a couple of things.
cudaHostRegister
:
I tried to use cudaHostRegister
with cudaHostRegisterMapped
and cudaHostRegisterIoMemory
. However, cudaHostRegisterMapped
only works for small allocations, and cudaHostRegisterIoMemory
always returns an invalid argument error regardless of the allocation size. I think cudaHostRegisterIoMemory
could be a solution, but I don’t understand why using the flag always results in an invalid argument error.
GPUDirect Storage: I also considered using direct storage but didn’t find a way to use it to perform memory-mapped IO.
HMM: I think HMM would work for what I’m planning to do, but our GPUs don’t support it, so I would prefer another method.
Thank you.