Does pageable memory have higher memory consumption than pinned memory?

As articles say, pageable memory incurs 2 copies: 1) from pagable memory to pinned memory 2) from pinned memory to a device.
Since pageable memory means there are pinned pages as well (as intermediary) does it mean that (using pagable memory) consumes double of what just pinned allocation would consume? Thanks!

When transferring data from / to pageable memory, a relatively small fixed-size pinned-memory buffer allocated by the CUDA driver is used to facilitate DMA transfers. So larger transfers will be broken up into multiple chunks by the driver. It is reasonable to expect that the size of this buffer can and does change with driver version. Last time I tried to determine its size through microbenchmarking several years ago I think I found that it was 4 MB in size, but my memory is vague.


Is it just one pinned buffer that allocated/deallocated only once? Or the driver needs to allocate as many pinned pages as there are host pages to copy?

As far as I know, a single buffer allocated once. But I haven’t tested an exhaustive list of scenarios. Maybe it is one buffer per GPU, or two buffers per GPU, one per transfer direction (since PCIe is a full duplex interconnect). In any event, a small fixed amount of buffer space allocated by the driver and used for the lifetime of the driver until it gets unloaded.

Use cases differ, but as long as the host system has a sufficient amount of bandwidth, the performance benefit from using pinned memory instead of pageable memory is quite moderate. This used to be different in the past, when some system memories could barely keep up with a single PCIe gen3 x16 connection operating at full speed.

Since modern operating systems are built around paging, pinning allocations is somewhat anathema to them, so my approach is to use simple pageable memory unless the use of pinned memory is essential to meet performance objectives.