Pinned memory

The CUDA driver first checks whether the memory range is locked or not, and then it determines which code path to follow.

Locked memory resides in physical memory (RAM), allowing the device to access it without assistance from the CPU (through mechanisms like DMA, also known as Async copy). In this way, pinned memory functions.

On the other hand, non-locked memory can trigger a page fault upon access. It may not reside solely in memory and could be swapped out to secondary storage. Therefore, the driver needs to access each page of non-locked memory, copy it into a pinned buffer, and then pass it to DMA for synchronous, page-by-page copying.

So, to clarify, pinned memory and a pinned buffer are essentially the same thing. Regarding the second explanation, if, for example, you have 1GB of data that is not locked and needs to be used, it will indeed require an additional 1GB of space in the pinned buffer. Therefore, the overall memory usage would be 2GB in this scenario.