cudaMemGetInfo no longer returns free memory for full system only looks at current process

Jason_Silverstein · November 2, 2018, 1:22pm

I’m having an issue with cudaMemGetInfo. Up until recently it returned the free GPU memory on a specific device for the all processes using that device. Now it appears that cudaMemGetInfo is only accounting for memory allocated in the process that is calling that function.

Does anyone know if this is a bug or how to fix this?

To be clear, I have multiple processes running and a single “watchdog” type application that needs to determine the full amount of free GPU memory in the system. I would rather not have to poll each sub-process for the amount of memory in use and in some cases this would not work.

Robert_Crovella · November 2, 2018, 3:20pm

Not sure what you are referring to. I see what I consider to be expected behavior. Here’s a simple test case:

$ cat test.cu
#include <unistd.h>
#include <iostream>


int main(){

  size_t mf, ma;
  cudaMemGetInfo(&mf, &ma);
  std::cout << "free: " << mf << " total: " << ma << std::endl;
  int *d;
  cudaMalloc(&d, 1048576*1024);
  std::cout << "alloc 1 Gig" << std::endl;
  cudaMemGetInfo(&mf, &ma);
  std::cout << "free: " << mf << " total: " << ma << std::endl;
  std::cout << "wait 10 seconds" << std::endl;
  usleep(1000000*10);
  cudaMemGetInfo(&mf, &ma);
  std::cout << "free: " << mf << " total: " << ma << std::endl;
  std::cout << "wait 10 seconds" << std::endl;
  usleep(1000000*10);
}

$ nvcc -o test test.cu
$

If I then run the above executable in two separate command prompts (so, two separate processes), with the second instance begun slightly after the first (as quick as I can click from one window to the other and hit enter) I get the following output, on CUDA 10.0, 410.48, CentOS7, Tesla P100

Instance 1:

$ ./test
free: 16766599168 total: 17071734784
alloc 1 Gig
free: 15692857344 total: 17071734784
wait 10 seconds
free: 14324465664 total: 17071734784
wait 10 seconds
$

Instance 2:

$ ./test
free: 15398207488 total: 17071734784
alloc 1 Gig
free: 14324465664 total: 17071734784
wait 10 seconds
free: 14324465664 total: 17071734784
wait 10 seconds
$

The output seems correct to me. The reported free memory, in the last output by either/both processes, is the same, and is reflective of the memory consumption taking into account both processes.

Jason_Silverstein · November 2, 2018, 3:34pm

Can you test it on a Windows machine and let me know if you see the issue?
Specifically Windows 10. We have tried on multiple machines and cards.
Cuda 10.0

Robert_Crovella · November 2, 2018, 4:43pm

I think you have enough information here to run my test case on windows if you wish. If time permits, I’ll run it when I have a chance to do so.

Windows GPU memory for WDDM devices is managed by Windows, not by CUDA. the cudaMemGetInfo call will return information based on what the Windows GPU virtual memory manager (from Microsoft) tells it. It’s possible this may change from time to time, depending on what microsoft decides to do.

If you have GPUs that can be placed in TCC mode on windows, that should take Microsoft mostly out of the reporting loop for this function on windows.

If you have problems you would like to report, you can file bugs at developer.nvidia.com

Jason_Silverstein · November 2, 2018, 8:28pm

OK, yes it works fine in linux. After some digging it looks like nvml is the thing to be using.
Thanks!

Topic		Replies	Views
CudaMemGetInfo() not reporting the actual gpu memory stats CUDA Programming and Performance	4	2583	October 13, 2023
cudaMemGetInfo returns similar result for 3 different GPUs CUDA Programming and Performance cuda , nvbugs	5	374	January 23, 2024
cudaMemGetInfo free mem value is not correct CUDA Programming and Performance	1	1049	September 9, 2018
Memory info and allocation CUDA Programming and Performance	3	883	August 30, 2018
Different CUDA memory usage between nvidia-smi and cudaMemGetInfo CUDA Programming and Performance	0	1234	September 19, 2019
cuMemGetInfo problem CUDA Programming and Performance	2	5775	May 23, 2008
cuMemGetInfo() CUDA Programming and Performance	12	47737	April 1, 2011
cudaMemGetInfo Vs nvmlDeviceGetMemoryInfo CUDA Programming and Performance	11	99	January 23, 2025
cudaMemGetInfo returns total, free and used memory as 0.00000MB CUDA Programming and Performance	2	959	March 29, 2018
Question about GPU Memory Overhead with Cudamallocmanaged CUDA Programming and Performance	7	1005	August 21, 2024

cudaMemGetInfo no longer returns free memory for full system only looks at current process

Related topics