Kernel maxing out GPU memory when it definitely should not be


I have a program that launches several thousand blocks of variable thread size (configured by launch parameters) on our P100. What I’ve noticed is that my pen-and-paper calculations show that it should only be using ~200 MB of memory, however when I run nvidia-smi it shows ~600 MB of memory being used. What initially alerted me to this discrepancy was one of our performance monitoring tools reporting periods of 100% device memory utilization - the amount of memory I’m using should not even be approaching the 12GB available, and it doesn’t match up with the numbers nvidia-smi is reporting, so this was very alarming! The last strange part of this whole problem is that the performance tool only reports max memory usage (I have it email me whenever this happens) when I use a specific thread block configuration; it doesn’t appear as if using other configurations triggers this effect. I’m very certain that I’m the only one utilizing the device when this happens.

Any ideas? I’m kind of at a loss as to what’s going on here.

When a GPU device is idle, it may report nearly 0 memory utilization.

When you run a program on the GPU, two things will consume memory:

  1. The memory needs of your program.
  2. The memory overhead of CUDA.

CUDA can easily use 400MB of overhead on a GPU. However this overhead should be roughly constant, whether your program needs 200MB of memory or 10GB of memory. So if your program uses 200MB of memory and nvidia-smi reports 600MB used, that doesn’t surprise me. Some things may affect this overhead, such as whether the GPU is driving a display, but that should be evident in the idle state also.

Regarding the behavior of your performance monitoring tool which you haven’t specified, I have no idea about that. It may have a bug. On the other hand, I’m pretty confident that nvidia-smi gives a reasonably accurate representation of memory used on a device.

That information on CUDA’s constant memory overhead is very useful, thank you! That’s one anomaly potentially resolved, at least; I’ll have to take things up with our sysadmin to see if perhaps there’s a bug in the performance monitoring tool we’re using.