CudaMemGetInfo() not reporting the actual gpu memory stats

I need to check the amount of free gpu memory before running the cuda algorithm, hence I used cudaMemGetInfo() to get the amount of free memory. But the returned free memory is not taking into account the gpu memory consumed by other opengl applications which are running in parallel. The windows task manager reports the actual usage of the dedicated gpu memory considering the opengl applications but cudaMemGetInfo() doesn’t take into the gpu memory consumed by opengl applications.

Whereas Opengl memory info extension properly reflects the free gpu memory taking into account even the cuda consumptions except for the 512MB that is consumed by the cuda driver.

Also cudaMemGetInfo() returns saying only 3.2GB free of 4GB when no gpu applications are running. Whereas if I create cuda context and then call cudaMemGetInfo() it return saying only 3.2GB free of 3.5GB. So where is the 512MB? is it taken by the cuda context?

I use Cuda 9.2 with NVIDIA driver 452.57 on Quadro P1000.

In summary I would like to understand -

  1. Why cudaMemGetInfo() not reflecting the gpu memory consumed by opengl applications when the same is reflected by the windows task manager?

  2. Why is cudaMemGetInfo() consumes 0.8GB by default even when no gpu applications are running?

  3. Why is cuda context consuming 512MB of gpu memory and as well it reduces this amount from the total memory reported by cudaMemGetInfo()?

         size_t free=0, total=0, stackLimit=0;
         cudaSetDevice(0);
         cudaMemGetInfo( &free, &total );
         std::cout << "CUDA memStat (free/total): \t\t" << free/(1024*1024) << "MB/" << total/(1024*1024) << "MB(" << (total-free)/(1024*1024) << ")\n";
    
         CUcontext ctx;
         int err=cuCtxCreate(&ctx,0,info.nDevice);
         if (err == 0)
         {
             cuMemGetInfo(&free,&total);
             cuCtxDetach(ctx);
         }
         std::cout << "CUDA context memStat (free/total): \t" << free/(1024*1024) << "MB/" << total/(1024*1024) << "MB(" << (total-free)/(1024*1024) << ")\n";
    
         GLint aDedicatedVidmem = 0;
     glGetIntegerv(GPU_MEMORY_INFO_DEDICATED_VIDMEM_NVX, &aDedicatedVidmem);
     total = aDedicatedVidmem * 1024;
         GLint aCurrentAvailableVidmem = 0;
     glGetIntegerv(GPU_MEMORY_INFO_CURRENT_AVAILABLE_VIDMEM_NVX, &aCurrentAvailableVidmem);
         free = aCurrentAvailableVidmem * 1024;
         std::cout << "Opengl memStat (free/total): \t" << free/(1024*1024) << "MB/" << total/(1024*1024) << "MB(" << (total-free)/(1024*1024) << ")\n";
    

I get the following output when I have around 1175MB used by opengl applications

CUDA memStat (free/total): 3385MB/4096MB(710)
CUDA context memStat (free/total): 3354MB/3583MB(229)
Opengl memStat (free/total): 2375MB/4096MB(1720)

When I create a 256MB of cuda cache using cudaMalloc() then the output is

CUDA memStat (free/total): 3129MB/4096MB(966)
Cuda context memStat (free/total): 3098MB/3583MB(485)
Opengl memStat (free/total): 2026MB/4096MB(2069)

So cuda allocations are accounted by the opengl memory info extension but the opengl usage is not accounted by the cudaMemGetInfo()!

Since you are running openGL on it, your GPU is evidently in WDDM mode. In WDDM mode, the GPU memory is not managed directly by CUDA or directly by the GPU driver. The GPU memory is managed by the windows OS, based on software provided by microsoft. This software provided by microsoft treats the GPU memory as a virtual resource, and therefore CUDA doesn’t know anything about actual memory usage, but only knows what the microsoft components tell it about GPU memory. I won’t be able to give any explanation beyond that.

If you don’t like this behavior (which is not under control of any software provided by NVIDIA) then my suggestion would be to get yourself another display GPU, and place your Quadro GPU into TCC mode. If you do that, the behavior will be much more understandable.

Yes, the act of running a CUDA call on a device can trigger CUDA context creation, which can use up 0.5GB or more of GPU memory.

I won’t be able to answer all the related questions, such as why does OpenGL appear to behavior differently. Since OpenGL also depends on an underlying microsoft graphics driver component, it’s quite possible the memory reporting system there is architected differently. Nevertheless, the cudaMemGetInfo call should be a useful indicator of how much memory can be allocated by CUDA, at any given point in time. The nature of the virtual memory management system for the GPU is that the sum of all users of the GPU can allocate more than the physical ram available on the GPU, in some circumstances. The WDDM virtual memory manager can do something like demand-paging of GPU memory.

1 Like

Thanks Robert_Crovella.

So in the WDDM mode the GPU memory stats are better to be queried from windows and in which case can I use the NVML for getting the accurate memory stats? I did a quick try to use the nvmlDeviceGetMemoryInfo() but it returns the same stats as the nvx memory info extensions. So probably I can use the NVML / nvx memory info extensions to get the more dependable gpu memory stats since my deployment may have NVIDIA gpus in WDDM mode or in a TCC mode.

Is there a way to programmatically calculate the memory that would be consumed by the CUDA context i.e. 0.5GB in my case? Also is this 0.5GB consumed for every CUDA context that we may create in different threads in different applications?

Not that I know of. It would certainly depend on a number of factors: GPU type, OS, CUDA version, driver version, etc. A context creation cost should come with every context creation. If you are using the driver API, you should already know what that means. If you are using the runtime API, it should be associated with every process that is using a specific GPU.

There are likely other possible overheads, as well. For example, my discussion here is mostly around CUDA contexts. However a GPU in WDDM mode will generally have a windows display driver stack built on top of it, by windows. To a first order approximation, this has nothing to do with CUDA.

I think it’s unlikely I would be able to respond to further detailed questions about these kinds of memory usages. From my perspective, they are mostly “opaque”.