I am profiling the memory usage of my CUDA program in a single-GPU setup. The memory usage reported by nvidia-smi (8GB) is different from memory usage reported by Nsight System (~6GB). I wonder how to explain the difference. e.g. what could be listed/detected under nvidia-smi that remains unreported in Nsight System.
Further information about my setup:
How I read memory usage from nvidia-smi: under Processes, my program row, and GPU Memory Usage column.
How I read memory usage from Nsight System: under Timeline View > Processes row > the process I launched > CUDA HW > Memory usage
Operating system: Ubuntu 20.04.6
Nsight System version: 2024.2.1.106-242134037904v0 Linux.
GPU hardware: GeForce RTX 3090
I suspect the following could be missing in Nsight System memory usage:
Memory blobs like a trained model
CUDA object codes
nvidia driver memory per thread
Are there other that could contribute? Could these be missed in Nsight System? If yes, is there a way to profile/spot these?
I see several other questions similar but none are answered:
My colleague was facing the same problem. He was seeing a sudden, unexplained 2GB jump in the memory reported by nvidia-smi. nsys/nsight did not show the memory. And no large allocations had occurred to explain the jump.
The answer turned out to be that one of his cuda files was compiled with -G (debugging). When a kernel from that file was used, the nvidia-smi memory jumped. Clearly, memory was being reserved for debugging tasks. When we took away the -G, the memory jump disappeared.
I believe the difference is that NSYS is only showing user allocated memory; whereas, nvidia-smi includes driver allocated memory for internal data structures, local memory/stack, malloc heap, printf, and instructions.