Cpu-to-gpu data transfer query

goginenidivija · May 19, 2024, 4:12pm

My understanding of cpu to gpu transfer is as follows: if the data is in pagable memory and it is actually not on RAM, the OS creates a copy of the data in the pinned region, and is then transferred to the device memory. I have two questions:

If the pagable memory buffer is present in RAM, does the OS lock the page in-place?
If the pagable memory buffer is in secondary storage, why can’t it transfer using GPUDirect and skip a copy?
How does the performance vary when allocating a pagable memory buffer vs allocating a pinned memory buffer?

Thanks!

Robert_Crovella · May 19, 2024, 6:23pm

The proper mental model here is that CUDA always copies pageable to pinned, before transferring.

CUDA has no knowledge whether a given pageable address is actually paged out, or not.

Curefab · May 19, 2024, 7:07pm

No, locking the page for a one-off transaction typically is slower than copying.
Guessing here: There would be no use case for it. Whoever uses GPUDirect has a workstation or embedded GPU and would typically run a highly optimized custom application. Having not enough RAM and relying on the operating system to swap out, would be very untypical. In those situations, the application developer would rather choose a solution with more control, e.g. the application controls the swapping instead of the operating system.
Are you asking about the performance of allocating or of copying after having allocated? Copying pinned memory is faster. Allocating memory depends. Has the application allocated a pool beforehand? Does the operating system have to free/swap something else from the now newly allocated pinned memory? Allocating should be moved outside performance-critical parts of the application, e.g. outside of loops.

Robert_Crovella · May 19, 2024, 8:02pm

First of all, GDS expects a filesystem interface (to wit: cuFile): a chunk of data paged out to Disk by the host OS into an opaque paging buffer is nothing like that. Second, GDS has specific software requirements for the storage software stack, plus system topology requirements, none of which are satisfied in the general case where a cudaMemcpy call may take place.

Topic		Replies	Views
question about page locked memory CUDA Programming and Performance	2	8821	April 21, 2009
Page Locked Memory CUDA Programming and Performance	5	4028	October 18, 2009
pageable and non-pageable memory CUDA Programming and Performance	2	6377	December 31, 2008
cudaMemcpy to non-pinned memory CUDA Programming and Performance	5	1734	October 12, 2021
Pinned Memory Allocation Why should it be driver specific? CUDA Programming and Performance	8	3247	September 1, 2009
Overlapping CPU<->GPU trasnfer and kernel computation only for pinned memory CUDA Programming and Performance	3	913	March 29, 2011
Pinned and Pageable memory CUDA Programming and Performance	5	2445	January 16, 2020
Does pageable memory have higher memory consumption than pinned memory? CUDA Programming and Performance	5	1040	October 12, 2021
Page-locked memory CUDA Programming and Performance	9	9163	April 8, 2009
transfer from pageable host memory to page-locked host memory? CUDA Programming and Performance	3	1057	June 1, 2012

Cpu-to-gpu data transfer query

Related topics