My understanding of cpu to gpu transfer is as follows: if the data is in pagable memory and it is actually not on RAM, the OS creates a copy of the data in the pinned region, and is then transferred to the device memory. I have two questions:
- If the pagable memory buffer is present in RAM, does the OS lock the page in-place?
- If the pagable memory buffer is in secondary storage, why can’t it transfer using GPUDirect and skip a copy?
- How does the performance vary when allocating a pagable memory buffer vs allocating a pinned memory buffer?
Thanks!
The proper mental model here is that CUDA always copies pageable to pinned, before transferring.
CUDA has no knowledge whether a given pageable address is actually paged out, or not.
First of all, GDS expects a filesystem interface (to wit: cuFile): a chunk of data paged out to Disk by the host OS into an opaque paging buffer is nothing like that. Second, GDS has specific software requirements for the storage software stack, plus system topology requirements, none of which are satisfied in the general case where a cudaMemcpy call may take place.