I just run one of our CUDA algorithms on my Geforce GTX 1070 (8 GB RAM) on Windows 10 (WDDM 2.0). In our GPU framework, we are monitoring the amount of GPU memory allocated currently - and I think the logging is correctly implemented.
When I run the algorithm (which is quite memory hungry), our own monitoring reports that ~ 10 GB of GPU RAM are currently allocated. How is that possible on a 8 GB RAM card ?? When I check via GPU-Z, it reports ~ 6 GB RAM allocated.
Is it therefore possible that WDDM model does support virtual memory (paging out unused memory) ?
WDDM implements a virtual memory system (including a form of paging) on any WDDM GPU, and this has been in place for a while and is independent of any comments about Unified Memory in CUDA.
It is possible for WDDM to oversubscribe GPU memory.
Thanks for the information …
Yep, it must be independent from unifed memory as we use the normal allocation routines (cudaMalloc/cudaMallocPtich). It is just a bit unexpected for me - I cannot remember that oversubscription was possible on previous Windows versions with WDDM < 2.0. But definitly a nice surprise.
Driver version is 387.XX
AFAIK the “oversubscription” I mentioned is a little different than what you might think of in the case of CUDA UM on a paging setup.
CUDA UM with paging (i.e. linux + pascal or newer GPU) can oversubscribe GPU memory even for a single CUDA allocation. This is not possible purely with the WDDM virtual memory (VM) system. The oversubscription here is perhaps more like context-switching. The WDDM VM system can “oversubscribe” the GPU memory in the sense that the total allocations needed for a particular task/context (e.g. graphics, or CUDA) cannot exceed GPU physical memory, but the total across all tasks/contexts can exceed GPU physical memory. The WDDM system then manages “context-switches” by moving data en-masse between GPU memory and host memory, to support whichever type of context is currently running.
If I remember correctly, Windows’ GPU memory management described by txbob (“backing store”) was already present in WDDM 1.x. I am actually not clear how WDDM 2.0 changed the memory management, but there must have been significant changes since WDDM 2.0 reserves a much bigger chunk of GPU memory for its own use, as multiple CUDA programmers have figured out the hard way (per messages in these forums).
I concur. I remember running into a test case (kernel launch took more time because WDDM was paging the context in) quite some time ago, I’m pretty sure it was in the WDDM 1.x era. So I’m not sure what is different 1.x vs. 2.x. I wasn’t trying to communicate that this is unique or specific to WDDM 2.x
Thanks to both for the clarification, you are right.
. I just did a test, on the GTX 10780 with 8 GB I can allocate at most 6.8 GB (in chunks of e.g. 100 megabytes). So it seems that the monitoring functionality in our framework does not report the correct values.
So for oversubscription with Cuda UM we would have to use TCC driver in windows, or switch to Linux.
For now, CUDA on Linux seems preferable to me for “serious” applications, especially if you are an ambidextrous developer who is equally comfortable with Windows and Linux. Just stay away from Linux distros that use funny animal names :-) [Sorry, pet peeve of mine; I am starting to sound like old Cato in the Roman senate: Ceterum censeo …]