Explaining memory usage mismatch between nvidia-smi and Nsight System

I am profiling the memory usage of my CUDA program in a single-GPU setup. The memory usage reported by nvidia-smi (8GB) is different from memory usage reported by Nsight System (~6GB). I wonder how to explain the difference. e.g. what could be listed/detected under nvidia-smi that remains unreported in Nsight System.

Further information about my setup:
How I read memory usage from nvidia-smi: under Processes, my program row, and GPU Memory Usage column.
How I read memory usage from Nsight System: under Timeline View > Processes row > the process I launched > CUDA HW > Memory usage
Operating system: Ubuntu 20.04.6
Nsight System version: 2024.2.1.106-242134037904v0 Linux.
GPU hardware: GeForce RTX 3090

I suspect the following could be missing in Nsight System memory usage:

  • Memory blobs like a trained model
  • CUDA object codes
  • nvidia driver memory per thread

Are there other that could contribute? Could these be missed in Nsight System? If yes, is there a way to profile/spot these?

I see several other questions similar but none are answered: