Does CUDA automatically allocate more GPU memory during the initialization of the application?

438020661 · November 26, 2024, 2:02am

On my Windows 10 system, my exe program uses a custom image processing library that we have written,This lib includes many CUDA algorithms and GPU memory allocation code. When I run this exe program, it only allocates one byte of GPU memory,but when I check with smi, I find that more than 500MB of GPU memory has been allocated. However, when there are subsequent memory allocations, I notice that the GPUmemory does not increase.I would like to confirm whether CUDA has a mechanism to allocate a memory pool for exe programs to avoid memory fragmentation? The static variables in the image processing library I checked did not allocate GPU memory, and when I do not use this library, the exe program does not allocate as much as 500MB of GPUmemory either.

Robert_Crovella · November 26, 2024, 2:21am

Yes, to some small degree, not every instance of cudaMalloc in your code results in a noticeable increase in the application memory consumption as indicated by nvidia-smi. The details are not published, and to my knowledge its nothing that approaches 500MB. But you can find reports of people who have tested this.

Additionally, it’s not uncommon for the memory consumption of a WDDM GPU to be in the hundreds of megabytes, and furthermore initializing a CUDA context may add hundreds of megabytes to that. You may be witnessing the “overhead cost” to run a CUDA application. The overhead will vary to some degree on various factors, and writing a library of code may result in allocation of space for that code, once you load the library into the CUDA context.

njuffa · November 26, 2024, 6:34am

Keep in mind that when you are using the WDDM driver (default on Windows), the operating system, not CUDA, is in charge of memory allocations. I am not aware of any Microsoft documentation that describes the details, but by observation if is frequently the case that Windows allocates more memory than what is requested by a CUDA app, presumably for its own internal purposes.

As a consequence, the amount of useable GPU memory (i.e. memory available to a CUDA-accelerated application) is often smaller with the WDDM driver than what is available via the TCC driver (on supported GPUs) or when running on Linux.

Topic		Replies	Views
Does cudaMalloc increases the private bytes used on host? CUDA Programming and Performance	9	1547	July 24, 2023
How much GPU memory can cudaMalloc get? CUDA Programming and Performance	17	15408	April 2, 2022
CUDA Is allocating much more GPU memory than expected CUDA Programming and Performance	7	218	April 8, 2025
Slow cudaMalloc (~1.5s) and slow mem access there, allocating nearly whole memory, with WDDM CUDA Programming and Performance	0	1123	June 18, 2014
Cudamalloc increase the commit memory in win10 CUDA Programming and Performance	0	322	April 3, 2020
Strange memory consumption on the device CUDA Programming and Performance	9	3646	July 13, 2017
CUDA Memory allocation issue on GTX 850M on Windows 7 (64) CUDA Programming and Performance	5	1493	October 29, 2015
cudaMalloc with sysmem fallback CUDA NVCC Compiler	1	91	October 15, 2025
How big is minimum assigned size in cudaMalloc? CUDA Programming and Performance	6	2269	January 17, 2018
cuMemAlloc questions CUDA Programming and Performance	1	2464	January 29, 2010

Does CUDA automatically allocate more GPU memory during the initialization of the application?

Related topics