Discrepiances with memory profiling

I’m profiling CUDA enabled application in order to get memory usage statistics. I use the nsys program with the following flags set:

nsys profile --kill none --sampling-frequency 500 --verbose --delay 600 --duration 120 -s cpu -b fp --cudabacktrace memory --cuda-memory-usage true --stats false …app command line…

nsys-ui shows memory usage at level 1.18 GiB and 2.45 MiB (CUDA HW / “Memory usage” and “Static memory usage”). On the other hand, jtop shows GPU memory occupancy at 3.1 GB. This discrepancy takes place all the time.

  1. How to interpret the results? Which one is proper?
  2. Is it possible to trace CUDA memory allocations (including requested sizes) and releasing using the nsight tool?

Hello ,
Please refer to the user guide for Nsight systems:
https://docs.nvidia.com/nsight-systems/UserGuide/#cuda-gpu-memory-graph