Explaining memory usage mismatch between nvidia-smi and Nsight System

fcdev · May 14, 2024, 6:23pm

I am profiling the memory usage of my CUDA program in a single-GPU setup. The memory usage reported by nvidia-smi (8GB) is different from memory usage reported by Nsight System (~6GB). I wonder how to explain the difference. e.g. what could be listed/detected under nvidia-smi that remains unreported in Nsight System.

Further information about my setup:
How I read memory usage from nvidia-smi: under Processes, my program row, and GPU Memory Usage column.
How I read memory usage from Nsight System: under Timeline View > Processes row > the process I launched > CUDA HW > Memory usage
Operating system: Ubuntu 20.04.6
Nsight System version: 2024.2.1.106-242134037904v0 Linux.
GPU hardware: GeForce RTX 3090

I suspect the following could be missing in Nsight System memory usage:

Memory blobs like a trained model
CUDA object codes
nvidia driver memory per thread

Are there other that could contribute? Could these be missed in Nsight System? If yes, is there a way to profile/spot these?

I see several other questions similar but none are answered:

rbsegal · March 25, 2025, 4:11pm

My colleague was facing the same problem. He was seeing a sudden, unexplained 2GB jump in the memory reported by nvidia-smi. nsys/nsight did not show the memory. And no large allocations had occurred to explain the jump.

The answer turned out to be that one of his cuda files was compiled with -G (debugging). When a kernel from that file was used, the nvidia-smi memory jumped. Clearly, memory was being reserved for debugging tasks. When we took away the -G, the memory jump disappeared.

Greg · March 26, 2025, 4:01am

I believe the difference is that NSYS is only showing user allocated memory; whereas, nvidia-smi includes driver allocated memory for internal data structures, local memory/stack, malloc heap, printf, and instructions.

Topic		Replies	Views
Mismatch of GPU utilization between Nsight systems profile and nvidia-smi Profiling Linux Targets cuda , nsight , pytorch	0	751	June 1, 2022
cudaMemGetInfo returns similar result for 3 different GPUs CUDA Programming and Performance cuda , nvbugs	5	378	January 23, 2024
Different CUDA memory usage between nvidia-smi and cudaMemGetInfo CUDA Programming and Performance	0	1239	September 19, 2019
Memory usage values in nvidia-smi command CUDA Programming and Performance	4	1640	November 21, 2023
Discrepiances with memory profiling Jetson Xavier NX cuda	2	815	October 18, 2021
per-process resource accounting CUDA Programming and Performance	2	2740	December 22, 2022
Nvidia-smi Memory-Usage of different GPUs always same CUDA Programming and Performance cuda	9	582	March 28, 2024
Why nvidia-smi, nor cudaMemGetInfo do not throw error with over-occupied device memory? CUDA Programming and Performance cuda	6	561	June 8, 2023
GPU Memory Usage shows "N/A" CUDA Setup and Installation	15	36359	May 22, 2024
Sqlite does not contain CUDA kernel data CUDA on Windows Subsystem for Linux	12	3685	April 28, 2023

Explaining memory usage mismatch between nvidia-smi and Nsight System

Related topics