Unified Memory is pageable? it can be swap out to disk?

gju06051 · April 1, 2025, 1:32pm

I understand that when using cudaMallocManaged to allocate memory in Unified Memory (UM), CUDA supports oversubscription starting from CUDA 8.0. In this model, when GPU memory runs out, pages are evicted to CPU memory as backing storage. I’m familiar with this mechanism.

My question concerns what happens when the Unified Memory region is first initialized on the CPU (e.g., by setting all values to 1), and later accessed extensively by the GPU. Specifically, if system-wide memory pressure causes the CPU-resident portion of the Unified Memory to be swapped out to disk by the operating system, how does CUDA handle eviction in such a case?

In oversubscription scenarios, if the GPU needs to evict a page and the corresponding CPU memory has been swapped out to disk, does the CUDA driver allocate a new region in CPU memory for the evicted page? Or does it wait for the OS to page the memory back in?

In summary, I would like to ask:
Is it possible for the CPU-resident portion of Unified Memory to be swapped out to disk under system memory pressure?
And if so, how is this handled by the Unified Memory driver during GPU evictions?

gju06051 · April 1, 2025, 1:42pm

Personally, I would expect that if swap-out were generally allowed, the overhead during GPU evictions would be unacceptably high.

So intuitively, I would assume that CUDA tries to prevent Unified Memory pages from being swapped out—similar to how cudaMallocHost uses pinned memory to avoid paging in traditional memcpy-based programming models. Is that not the case?

Robert_Crovella · April 1, 2025, 1:46pm

AFAIK the UM system pages cannot be swapped out to disk. One indicator of this is that the upper bound for GPU oversubscribed allocations is the system memory size (you can test this; I’m not suggesting its documented, nor is it a complete proof of the claim).

In any event, from what I can see here, this is all implementation detail. What I mean by that is that I don’t believe NVIDIA documents or specifies UM behavior to this level of detail.

gju06051 · April 1, 2025, 2:05pm

Thank you for the quick response.

I have a follow-up question:
In a situation where system memory is not under pressure, does the Unified Memory region typically remain resident (i.e., not swapped out) and behave like pinned memory?
Or is this also an implementation-dependent detail?

Robert_Crovella · April 1, 2025, 2:22pm

I don’t really understand the question. I just indicated that I believe the UM system pages cannot be swapped out to disk. Therefore when you ask:

I would say the same thing. AFAIK, UM pages are never swapped out to disk.

The general UM behavior on linux for Pascal or newer GPUs is one of migration. That is, the pages will migrate to the processor that wishes to access them.

Under certain circumstances, the UM system can decide to not migrate a page, but instead convert it into a host-resident mapped page, which means that it has effectively become like host-pinned memory. This is not the default behavior; it must be arrived at via UM system heuristics. One of the heuristics driving this could be UM usage in a multi-GPU system where peer mappings are not possible.

Some other items that may be of interest:

Differences between UM and pinned memory
“File-backed” UM usage (note that this only applies to systems with “full” UM support, which means either HMM or ATS is in effect.)

I don’t think the “File-backed” case is “typical” currently, but if that constitutes your definition of “swapped out to disk” then I would amend all my previous comments to say “in the case where HMM or ATS is not in effect, or otherwise in the case of an ordinary, not file-backed, UM allocation…”

gju06051 · April 2, 2025, 3:46am

No worries at all — I apologize for asking the question in such a vague way that may have caused confusion.
Thanks to both of your responses, I was able to fully understand everything I needed.

I sincerely appreciate your help.

system · April 16, 2025, 3:46am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
CUDA 6.5 Unified Memory (cudamallocmanaged) CUDA Programming and Performance	1	2177	February 18, 2015
Question on working of CUDA Unified Memory CUDA Programming and Performance cuda	1	592	December 6, 2021
gpu swapping CUDA Programming and Performance	20	9733	October 31, 2022
How to free gpu pages in unified memory so that subsequent cudaMalloc can use more memory? CUDA Programming and Performance	0	26	July 7, 2025
Unified memory CUDA Programming and Performance	2	762	November 11, 2019
unified memory with CUDA 8 CUDA Programming and Performance	7	3384	April 2, 2018
about managed memory Legacy PGI Compilers	1	1783	October 9, 2017
Unified memory oversubscription and page faults CUDA Programming and Performance	7	2851	March 23, 2018
Details of Unified Memory and Oversubscription CUDA Programming and Performance hw , cuda , ubuntu	0	223	April 16, 2025
Does Pascal Unified Memory, mentioned in pascal whitepaper, supported now? CUDA Programming and Performance	9	1686	April 14, 2017

Unified Memory is pageable? it can be swap out to disk?

Related topics