cudaMemGetInfo no longer returns free memory for full system only looks at current process

I’m having an issue with cudaMemGetInfo. Up until recently it returned the free GPU memory on a specific device for the all processes using that device. Now it appears that cudaMemGetInfo is only accounting for memory allocated in the process that is calling that function.

Does anyone know if this is a bug or how to fix this?

To be clear, I have multiple processes running and a single “watchdog” type application that needs to determine the full amount of free GPU memory in the system. I would rather not have to poll each sub-process for the amount of memory in use and in some cases this would not work.

Not sure what you are referring to. I see what I consider to be expected behavior. Here’s a simple test case:

$ cat test.cu
#include <unistd.h>
#include <iostream>


int main(){

  size_t mf, ma;
  cudaMemGetInfo(&mf, &ma);
  std::cout << "free: " << mf << " total: " << ma << std::endl;
  int *d;
  cudaMalloc(&d, 1048576*1024);
  std::cout << "alloc 1 Gig" << std::endl;
  cudaMemGetInfo(&mf, &ma);
  std::cout << "free: " << mf << " total: " << ma << std::endl;
  std::cout << "wait 10 seconds" << std::endl;
  usleep(1000000*10);
  cudaMemGetInfo(&mf, &ma);
  std::cout << "free: " << mf << " total: " << ma << std::endl;
  std::cout << "wait 10 seconds" << std::endl;
  usleep(1000000*10);
}

$ nvcc -o test test.cu
$

If I then run the above executable in two separate command prompts (so, two separate processes), with the second instance begun slightly after the first (as quick as I can click from one window to the other and hit enter) I get the following output, on CUDA 10.0, 410.48, CentOS7, Tesla P100

Instance 1:

$ ./test
free: 16766599168 total: 17071734784
alloc 1 Gig
free: 15692857344 total: 17071734784
wait 10 seconds
free: 14324465664 total: 17071734784
wait 10 seconds
$

Instance 2:

$ ./test
free: 15398207488 total: 17071734784
alloc 1 Gig
free: 14324465664 total: 17071734784
wait 10 seconds
free: 14324465664 total: 17071734784
wait 10 seconds
$

The output seems correct to me. The reported free memory, in the last output by either/both processes, is the same, and is reflective of the memory consumption taking into account both processes.

Can you test it on a Windows machine and let me know if you see the issue?
Specifically Windows 10. We have tried on multiple machines and cards.
Cuda 10.0

I think you have enough information here to run my test case on windows if you wish. If time permits, I’ll run it when I have a chance to do so.

Windows GPU memory for WDDM devices is managed by Windows, not by CUDA. the cudaMemGetInfo call will return information based on what the Windows GPU virtual memory manager (from Microsoft) tells it. It’s possible this may change from time to time, depending on what microsoft decides to do.

If you have GPUs that can be placed in TCC mode on windows, that should take Microsoft mostly out of the reporting loop for this function on windows.

If you have problems you would like to report, you can file bugs at developer.nvidia.com

OK, yes it works fine in linux. After some digging it looks like nvml is the thing to be using.
Thanks!