cudaMalloc with sysmem fallback

Robert_Crovella · October 15, 2025, 6:47pm

cudaMalloc, on a windows platform, with WDDM driver model, does not actually directly allocate memory on the GPU. It makes a request of the WDDM driver (a Microsoft API) to request the allocation.

The WDDM system actually manages GPU memory (plus sysmem as a fallback) via something like a demand-paged virtual memory system. In a WDDM setup, the CUDA subsystem is just one of potentially several users of the GPU (the other big one being the windows display system). This memory management is not directly under the control of the NVIDIA driver, and has a few notable effects:

oversubscription is possible. I’ve not personally witnessed oversubscription for a single client (e.g. CUDA) but I have witnessed oversubscription when there are multiple clients making memory requests
paging (i.e. movement/relocation) of data is possible, between device memory and sysmem

These characteristics are not under control of the NVIDIA driver. The WDDM system is allowed to manage memory as it sees fit, and it may choose to move data from device memory to sys mem as it sees fit. However, when a CUDA kernel is running, for example, my understanding is any necessary device memory allocations will be moved to and actually resident in device memory, for the CUDA client to use.

As you point out, cudaMallocManaged has somewhat different behavior, and in fact the behavior on windows is not demand paged, but the data does migrate at certain points.

This movement of data is not under control of NVIDIA driver. It is expected functionality and not a bug. There would be no reason to return e.g. an out-of-memory error, and in fact the NVIDIA driver has no knowledge of whether the data will be moved and be resident in sysmem, and therefore would have no basis to offer such an error, anyway.

If you don’t like this behavior, for certain kinds of NVIDIA GPUs, you can select an alternate driver model, eg. TCC, which will take the WDDM subsystem out of the picture. (This would remove the possibility for using that GPU for display purposes.) Another option might be to switch to linux.

This condition of WDDM has been true for a long time (at least 5 years or more) and is not dependent on a new development in any recent NVIDIA drivers, in order to observe the basic data movement effect.

Topic		Replies	Views
Does cudaMalloc increases the private bytes used on host? CUDA Programming and Performance	9	1547	July 24, 2023
cudaMallocManaged() not allocating memory in device memory CUDA Programming and Performance	4	2161	August 22, 2018
Slow cudaMalloc (~1.5s) and slow mem access there, allocating nearly whole memory, with WDDM CUDA Programming and Performance	0	1123	June 18, 2014
How much GPU memory can cudaMalloc get? CUDA Programming and Performance	17	15408	April 2, 2022
cudaMallocManaged allocating more memory than requested CUDA Programming and Performance	7	3368	July 13, 2018
Does CUDA automatically allocate more GPU memory during the initialization of the application? CUDA Programming and Performance cuda	2	293	November 26, 2024
cudaMalloc fails after CUDA Programming and Performance	6	7771	June 18, 2012
How to control where cudaMallocManaged allocates buffer (device or host) Nsight Visual Studio Edition cuda	0	411	March 23, 2020
cudaMalloc() CUDA Programming and Performance	0	864	October 9, 2013
cudamalloc not allocating memeory CUDA Programming and Performance	0	1308	May 1, 2012

cudaMalloc with sysmem fallback

Related topics