Nvidia-smi shows high global memory usage, but low in the only process


Are there any reasons the memory usage of the process is underestimated with nvidia-smi? I got similar results using nvprof - 1.6GB shows up in the process and in nvprof, but globally I see 9.3GB being used. Estimating the memory indicates that it should be around 9-10GB.


I’ve reset the gpu through nvidia-smi --reset before running the experiment.

Thanks in advance for any insight/help!

I’ve run some tests:

  • with cudaMallocManaged I get the result above
  • with cudaMalloc the values are consistent

I now think that the global counter includes reservations/claims of the processes and the usage of the process is only the actual usage. Under this hypothesis: The output of nvprof was a bit of a red herring that it had the same values, leading me to believe the output was suspect as well.

I had the same question (regarding cudaMallocManaged). I have actually done some analysis/experiments myself.

But I haven’t gotten any answer from Nvidia, which is strange.