On my Windows 10 system, my exe program uses a custom image processing library that we have written,This lib includes many CUDA algorithms and GPU memory allocation code. When I run this exe program, it only allocates one byte of GPU memory,but when I check with smi, I find that more than 500MB of GPU memory has been allocated. However, when there are subsequent memory allocations, I notice that the GPUmemory does not increase.I would like to confirm whether CUDA has a mechanism to allocate a memory pool for exe programs to avoid memory fragmentation? The static variables in the image processing library I checked did not allocate GPU memory, and when I do not use this library, the exe program does not allocate as much as 500MB of GPUmemory either.
Yes, to some small degree, not every instance of cudaMalloc
in your code results in a noticeable increase in the application memory consumption as indicated by nvidia-smi
. The details are not published, and to my knowledge its nothing that approaches 500MB. But you can find reports of people who have tested this.
Additionally, it’s not uncommon for the memory consumption of a WDDM GPU to be in the hundreds of megabytes, and furthermore initializing a CUDA context may add hundreds of megabytes to that. You may be witnessing the “overhead cost” to run a CUDA application. The overhead will vary to some degree on various factors, and writing a library of code may result in allocation of space for that code, once you load the library into the CUDA context.
Keep in mind that when you are using the WDDM driver (default on Windows), the operating system, not CUDA, is in charge of memory allocations. I am not aware of any Microsoft documentation that describes the details, but by observation if is frequently the case that Windows allocates more memory than what is requested by a CUDA app, presumably for its own internal purposes.
As a consequence, the amount of useable GPU memory (i.e. memory available to a CUDA-accelerated application) is often smaller with the WDDM driver than what is available via the TCC driver (on supported GPUs) or when running on Linux.