Unified Memory is pageable? it can be swap out to disk?

I understand that when using cudaMallocManaged to allocate memory in Unified Memory (UM), CUDA supports oversubscription starting from CUDA 8.0. In this model, when GPU memory runs out, pages are evicted to CPU memory as backing storage. I’m familiar with this mechanism.

My question concerns what happens when the Unified Memory region is first initialized on the CPU (e.g., by setting all values to 1), and later accessed extensively by the GPU. Specifically, if system-wide memory pressure causes the CPU-resident portion of the Unified Memory to be swapped out to disk by the operating system, how does CUDA handle eviction in such a case?

In oversubscription scenarios, if the GPU needs to evict a page and the corresponding CPU memory has been swapped out to disk, does the CUDA driver allocate a new region in CPU memory for the evicted page? Or does it wait for the OS to page the memory back in?

In summary, I would like to ask:
Is it possible for the CPU-resident portion of Unified Memory to be swapped out to disk under system memory pressure?
And if so, how is this handled by the Unified Memory driver during GPU evictions?

Personally, I would expect that if swap-out were generally allowed, the overhead during GPU evictions would be unacceptably high.

So intuitively, I would assume that CUDA tries to prevent Unified Memory pages from being swapped out—similar to how cudaMallocHost uses pinned memory to avoid paging in traditional memcpy-based programming models. Is that not the case?

AFAIK the UM system pages cannot be swapped out to disk. One indicator of this is that the upper bound for GPU oversubscribed allocations is the system memory size (you can test this; I’m not suggesting its documented, nor is it a complete proof of the claim).

In any event, from what I can see here, this is all implementation detail. What I mean by that is that I don’t believe NVIDIA documents or specifies UM behavior to this level of detail.

1 Like

Thank you for the quick response.

I have a follow-up question:
In a situation where system memory is not under pressure, does the Unified Memory region typically remain resident (i.e., not swapped out) and behave like pinned memory?
Or is this also an implementation-dependent detail?

I don’t really understand the question. I just indicated that I believe the UM system pages cannot be swapped out to disk. Therefore when you ask:

I would say the same thing. AFAIK, UM pages are never swapped out to disk.

The general UM behavior on linux for Pascal or newer GPUs is one of migration. That is, the pages will migrate to the processor that wishes to access them.

Under certain circumstances, the UM system can decide to not migrate a page, but instead convert it into a host-resident mapped page, which means that it has effectively become like host-pinned memory. This is not the default behavior; it must be arrived at via UM system heuristics. One of the heuristics driving this could be UM usage in a multi-GPU system where peer mappings are not possible.

Some other items that may be of interest:

  1. Differences between UM and pinned memory
  2. “File-backed” UM usage (note that this only applies to systems with “full” UM support, which means either HMM or ATS is in effect.)

I don’t think the “File-backed” case is “typical” currently, but if that constitutes your definition of “swapped out to disk” then I would amend all my previous comments to say “in the case where HMM or ATS is not in effect, or otherwise in the case of an ordinary, not file-backed, UM allocation…”

1 Like

No worries at all — I apologize for asking the question in such a vague way that may have caused confusion.
Thanks to both of your responses, I was able to fully understand everything I needed.

I sincerely appreciate your help.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.