Pinned memory

Hi,
A few questions related to pinned memory:

  • When allocating pinned memory (possibly 256MB - 1GB), it seems some memory is being allocated on the currently selected GPU RAM. Why exactly? Is there some logic in the size that is being allocated? I need to know as I’m pre-allocating most of the GPU RAM and the pinned allocations sometimes fails because (of what looks like) an insufficient space in the GPU RAM.
  • When copying a non-pinned buffer to the GPU, does the driver allocates chunks of pinned memory and does some/many intermediate copies from my pageable memory to the pinned memory to the device?
    If so, what are the sizes of those intermediate buffers?

thanks
Eyal

For host->device copies, if the amount of data is small, it will be sent as part of the command stream. Larger copies use the DMA mechanism, for which the driver allocates a pinned buffer through which the data is copied in chunks.

It used to be that “small” ~= up to tens of kilobytes, and pinned buffer size in driver ~= single-digit megabytes, large enough to achieve good throughput for the DMA transfers. The sizes are not documented since they are implementation artifacts that could change between driver versions. With a bit of clever benchmarking you could probably reverse engineer what the sizes are for any given driver version, but this is not really something CUDA programmers should worry about (at least I have never had a need to do so, and using that kind of information tends to make one’s code brittle).

Thanks for the reply :)
Are those pinned buffers allocated by the driver are cached? i.e. in multiple copy calls the driver will NOT reallocate pinned buffers over and over but use the previous pre-allocated pinned buffers?

what would you suggest for a big (1-2GB) non pinned buffers - let the driver handle those copies through the DMA use you’ve described or try some custom solution on my own?

Also, when allocating 100MB-1GB pinned buffers, why is there an overhead in the device memory and is there some formula to estimate how much device/GPU memory will be required?

thanks
Eyal

I can’t see inside the driver, but it is a reasonable assumption that the pinned “transfer buffer” created by the driver is allocated once, and reused as often as necessary until the driver is unloaded.

Hi,
Any insights as to the why and how to estimate the “overhead” in device memory for my own pinned memory allocations?

thanks
Eyal