Virtual memory (paging) support for Pascal GPUs on Windows 10 with WDDM 2.0 model ?

HannesF99 · December 19, 2017, 1:03pm

I just run one of our CUDA algorithms on my Geforce GTX 1070 (8 GB RAM) on Windows 10 (WDDM 2.0). In our GPU framework, we are monitoring the amount of GPU memory allocated currently - and I think the logging is correctly implemented.

When I run the algorithm (which is quite memory hungry), our own monitoring reports that ~ 10 GB of GPU RAM are currently allocated. How is that possible on a 8 GB RAM card ?? When I check via GPU-Z, it reports ~ 6 GB RAM allocated.

Is it therefore possible that WDDM model does support virtual memory (paging out unused memory) ?

Robert_Crovella · December 19, 2017, 3:40pm

WDDM implements a virtual memory system (including a form of paging) on any WDDM GPU, and this has been in place for a while and is independent of any comments about Unified Memory in CUDA.

It is possible for WDDM to oversubscribe GPU memory.

[url]https://docs.microsoft.com/en-us/windows-hardware/drivers/display/gpu-virtual-memory-in-wddm-2-0[/url]

HannesF99 · December 19, 2017, 7:00pm

Thanks for the information …
Yep, it must be independent from unifed memory as we use the normal allocation routines (cudaMalloc/cudaMallocPtich). It is just a bit unexpected for me - I cannot remember that oversubscription was possible on previous Windows versions with WDDM < 2.0. But definitly a nice surprise.
Driver version is 387.XX

Robert_Crovella · December 19, 2017, 7:28pm

AFAIK the “oversubscription” I mentioned is a little different than what you might think of in the case of CUDA UM on a paging setup.

CUDA UM with paging (i.e. linux + pascal or newer GPU) can oversubscribe GPU memory even for a single CUDA allocation. This is not possible purely with the WDDM virtual memory (VM) system. The oversubscription here is perhaps more like context-switching. The WDDM VM system can “oversubscribe” the GPU memory in the sense that the total allocations needed for a particular task/context (e.g. graphics, or CUDA) cannot exceed GPU physical memory, but the total across all tasks/contexts can exceed GPU physical memory. The WDDM system then manages “context-switches” by moving data en-masse between GPU memory and host memory, to support whichever type of context is currently running.

njuffa · December 19, 2017, 7:32pm

If I remember correctly, Windows’ GPU memory management described by txbob (“backing store”) was already present in WDDM 1.x. I am actually not clear how WDDM 2.0 changed the memory management, but there must have been significant changes since WDDM 2.0 reserves a much bigger chunk of GPU memory for its own use, as multiple CUDA programmers have figured out the hard way (per messages in these forums).

Robert_Crovella · December 19, 2017, 7:57pm

I concur. I remember running into a test case (kernel launch took more time because WDDM was paging the context in) quite some time ago, I’m pretty sure it was in the WDDM 1.x era. So I’m not sure what is different 1.x vs. 2.x. I wasn’t trying to communicate that this is unique or specific to WDDM 2.x

njuffa · December 19, 2017, 8:01pm

My post #5 was meant to augment, not contradict, your post #4. In other words, we are in complete agreement.

HannesF99 · December 20, 2017, 7:51am

Thanks to both for the clarification, you are right.
. I just did a test, on the GTX 10780 with 8 GB I can allocate at most 6.8 GB (in chunks of e.g. 100 megabytes). So it seems that the monitoring functionality in our framework does not report the correct values.
So for oversubscription with Cuda UM we would have to use TCC driver in windows, or switch to Linux.

njuffa · December 20, 2017, 8:14am

For now, CUDA on Linux seems preferable to me for “serious” applications, especially if you are an ambidextrous developer who is equally comfortable with Windows and Linux. Just stay away from Linux distros that use funny animal names :-) [Sorry, pet peeve of mine; I am starting to sound like old Cato in the Roman senate: Ceterum censeo …]

Robert_Crovella · December 20, 2017, 2:19pm

With CUDA 9 or 9.1 you would have to switch to Linux. See here:

http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-requirements

Topic		Replies	Views
CUDA unified memory oversubscription in Windows systems CUDA Programming and Performance	4	1706	January 4, 2021
Unified memory and overprovisioning CUDA Programming and Performance cuda	5	1448	March 6, 2022
Using unified memory in GTX 1070 with CUDA 8 CUDA Programming and Performance	4	1506	October 29, 2016
Windows 10 using ~1 GB of memory for all GPUs (WDDM) CUDA Programming and Performance	3	5884	October 22, 2017
Question about cudaMalloc Behavior When Exceeding Physical VRAM on GTX 1070 CUDA Programming and Performance	1	44	December 26, 2024
Unified Memory Limits? CUDA on Windows Subsystem for Linux	7	3784	July 6, 2022
cudaMemcpy Timing Variability on Windows CUDA Programming and Performance	3	695	May 23, 2018
Pascal & capabilities 6.0 show cudaDevAttrConcurrentManagedAccess is 0 CUDA Programming and Performance	15	1397	December 27, 2018
cudaMemPrefetchAsync returns cudaErrorInvalidDevice CUDA Programming and Performance	21	4578	November 15, 2021
Problem regarding data transfer overlap between multiple asynchronous streams CUDA Programming and Performance	8	807	September 11, 2016

Virtual memory (paging) support for Pascal GPUs on Windows 10 with WDDM 2.0 model ?

Related topics