Monitoring GPU utilization of dGPU on DRIVE AGX Pegasus


I’m trying to monitor the GPU utilization of dGPU on the AGX Pegasus platform. However, I’m having trouble finding a way to do this. I can monitor the iGPU using tegrastat utility, however according to Can't detect the dGPU utilization through tegrastat, it appears that tegrastats is not supported (and will not be supported) for the dGPU.

It mentions to check nvprof or Nsight for dGPU utilization. I tried using nvprof and Nsight Compute and there are utilization stats for individual components of the GPU when running a process, but no overall GPU utilization.
I heard (or read somewhere) that maybe Nsight Systems CLI would provide overall GPU utilization. However, I’m not able to find a version of Nsight Systems CLI that works with AGX platform. I tried using the Nsight Systems GUI, but it does not seem to provide this statistic.

Any help would be appreciated. Thanks!

Please provide the following info (check/uncheck the boxes after clicking “+ Create Topic”):
Software Version
DRIVE OS Linux 5.2.0
DRIVE OS Linux 5.2.0 and DriveWorks 3.5
NVIDIA DRIVE™ Software 10.0 (Linux)
NVIDIA DRIVE™ Software 9.0 (Linux)
other DRIVE OS version

Target Operating System

Hardware Platform
NVIDIA DRIVE™ AGX Xavier DevKit (E3550)
NVIDIA DRIVE™ AGX Pegasus DevKit (E3550)

SDK Manager Version

Host Machine Version
native Ubuntu 18.04

Hi @kenneth.chu ,
May I know this topic is for DRIVE PX 2 / AGX devkit?

On your host system, you can try to find the nsight-system installed with DRIVE OS Linux 5.2.0 by SDK Manager.
$ grep -ri nsight-systems ~/.nvsdkm/

It is for DRIVE AGX Pegasus devkit. I corrected the description in the original post.

I can install Nsight System GUI from DRIVE OS Linux 5.2.0 and run it from the Linux host. However, I could not find the overall GPU utilization of the dGPU in the GUI. If it does has this capability, is there some documentation on how to get this info? I could not find a CLI version of Nsight Systems. Not sure if it’s true, but I’ve heard from somewhere else, that the Nsight System CLI does have overall GPU utilization.

It seems to be installed to /opt/nvidia/nsight-systems so you can try to find if cli and relevant document there. Thanks.

On the host, I can see the documentation in /opt/nvidia/nsight-systems/2020.3.3/documentation, but most of the files are empty (just a cover page). I found a reference to the docs at though. Not sure this applies to this version of Nsight Systems.

On the target, I can run /opt/nvidia/nsight_systems/nsys. But I still don’t see anywhere which will tell me the overall GPU utilization. Any ideas about this? I’ll keep looking in the mean time. Thanks.

Please go head and check to see if anything you’re looking for. I think it won’t have much difference.

I couldn’t find anywhere that could tell me the overall GPU utilization on AGX Pegasus. Just wanted to confirm with you that there is no way to monitor the overall GPU utilization on Pegasus right now. Thanks.

Please check if API can measure or query values of performance counters helps.

I took a look at CUPTI and it looks like the metrics are the same (or similar) as the metrics from Nsight Compute. Is my understanding correct? I was able to run Nsight Compute on the processes running in our system.

With Nsight Compute, I can capture the GPU Speed Of Light, Compute Workload Analysis and Memory Workload Analysis sections. However, the issue I have is that we have multiple processes using the GPU and I do not know how to combine these metrics from multiple processes into an overall GPU utilization.

Ideally, I’d like the same information that is provided by nvidia-smi. Example:

| NVIDIA-SMI 450.102.04   Driver Version: 450.102.04   CUDA Version: 11.0     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  GeForce RTX 208...  Off  | 00000000:01:00.0 Off |                  N/A |
| 30%   30C    P8   541W / 250W |     93MiB / 11019MiB |      0%      Default |
|                               |                      |                  N/A |

| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|    0   N/A  N/A      2066      G   /usr/lib/xorg/Xorg                 91MiB |

Notice the GPU utilization, memory usage, etc. I understand that this tool is for PC and uses the NVML library to get this information. However, that the NVML library is not supported on ARM (Pegasus platform).

Any ideas how I can get this same information on Pegasus? For example, how I can use the metrics from CUPTI or Nsight Compute to understand the overall GPU utilization?

Any more ideas about this? Please let me know either way whether it is possible using the current libraries and tools to measure the overall GPU utilization on AGX Pegasus. This is critical to us for our current project to understand how many loads we can run on the current platform.

So far I’ve tried tegrastats, Nsight Compute and Nsight Systems and I haven’t been able find a way to measure the overall GPU utilization. For tegrastats, the utilization always shows 0%. For Nsight Compute, it appears that only a single process can be profiled at a time (according to the documentation, multiple processes are serialized) and for Nsight Systems, it appears that the GPU metrics on not enabled on this version.

Thanks for your help and any ideas on how we can move forward.

Currently we don’t have a tegrastats-like tool to monitor dGPU usage.

The nsys CLI tool supports an option ( --stats) that prints summary of CUDA kernel and memory operation statistics on the target.
$nsys --stats=true profile
This summary is closely related to GPU utilization. Alternatively the qdrep file obtained after profiling can also be used to obtain the stats using
$nsys stats

I think there are some limitations with the DRIVE OS version of Nsight Systems as compared to the PC version documented in Nsight Systems User Guide :: Nsight Systems Documentation.

I could not run "nsys --stats=true profile " as specified in the documentation. I had to run "nsys profile " and then “nsys stats report.qdrep” after. Another limitation it seems is that the --gpu-metrics* options does not exist in the DRIVE OS version of nsys.

I tried running the “nsys profile” and “nsys stats” on the sample CUDA application matrixMul and this is the output I get:

Generate SQLite file report3.sqlite from report3.qdrep
Exporting 11491 events: [=================================================100%]
Using report3.sqlite file for stats and reports.
Exporting [/opt/nvidia/nsight_systems/reports/cudaapisum report3.sqlite] to console...

 Time(%)  Total Time (ns)  Num Calls    Average    Minimum    Maximum           Name
 -------  ---------------  ---------  -----------  --------  ---------  ---------------------
    93.9        662003488          3  220667829.3     17152  661923904  cudaMalloc
     2.8         19747360        301      65605.8     23520     290688  cudaLaunchKernel
     2.7         18871840          1   18871840.0  18871840   18871840  cudaEventSynchronize
     0.5          3314240          3    1104746.7    160352    2887584  cudaMemcpy
     0.1           724224          3     241408.0     30688     613088  cudaFree
     0.0            31136          2      15568.0     15424      15712  cudaEventRecord
     0.0            19264          1      19264.0     19264      19264  cudaDeviceSynchronize
     0.0            15968          2       7984.0      5408      10560  cudaEventCreate

Exporting [/opt/nvidia/nsight_systems/reports/gpukernsum report3.sqlite] to console...

 Time(%)  Total Time (ns)  Instances  Average   Minimum  Maximum                            Name
 -------  ---------------  ---------  --------  -------  -------  --------------------------------------------------------
   100.0         39234208        301  130346.2   130048   130944  void MatrixMulCUDA<32>(float*, float*, float*, int, int)

Exporting [/opt/nvidia/nsight_systems/reports/gpumemtimesum report3.sqlite] to console...

 Time(%)  Total Time (ns)  Operations  Average  Minimum  Maximum      Operation
 -------  ---------------  ----------  -------  -------  -------  ------------------
    60.8            72640           2  36320.0    25120    47520  [CUDA memcpy HtoD]
    39.2            46880           1  46880.0    46880    46880  [CUDA memcpy DtoH]

Exporting [/opt/nvidia/nsight_systems/reports/gpumemsizesum report3.sqlite] to console...

  Total    Operations  Average  Minimum  Maximum      Operation
 --------  ----------  -------  -------  -------  ------------------
  800.000           1  800.000  800.000  800.000  [CUDA memcpy DtoH]
 1200.000           2  600.000  400.000  800.000  [CUDA memcpy HtoD]

Exporting [/opt/nvidia/nsight_systems/reports/osrtsum report3.sqlite] to console...

 Time(%)  Total Time (ns)  Num Calls   Average    Minimum    Maximum            Name
 -------  ---------------  ---------  ----------  --------  ---------  ----------------------
    55.8        778706336          8  97338292.0  78046304  100133120  sem_timedwait
    26.1        363706496         10  36370649.6     43456  128691872  poll
    17.4        242384800        454    533887.2      1408   23112736  ioctl
     0.4          5094112        416     12245.5       992     644704  sched_yield
     0.1          1799968         60     29999.5      7040      97728  mmap
     0.1          1452000          9    161333.3     72640     420192  sem_wait
     0.0           535648         23     23289.0      4416      54304  open
     0.0           506272         24     21094.7      3584      66368  fopen
     0.0           459104          2    229552.0    147936     311168  pthread_create
     0.0           417312         10     41731.2      6944      87520  write
     0.0           286976         13     22075.1      2976     161568  read
     0.0           180096          3     60032.0     27168      76608  fgets
     0.0            85376          1     85376.0     85376      85376  open64
     0.0            81376         25      3255.0      1248       4864  fcntl
     0.0            70944          3     23648.0     20384      25568  pipe2
     0.0            35232          1     35232.0     35232      35232  connect
     0.0            32512          2     16256.0      9024      23488  munmap
     0.0            26976          1     26976.0     26976      26976  socket
     0.0            23680          3      7893.3      4960      10688  fclose
     0.0             1248          1      1248.0      1248       1248  pthread_cond_broadcast

Exporting [/opt/nvidia/nsight_systems/reports/nvtxppsum report3.sqlite] to console... SKIPPED: report3.sqlite does not contain NV Tools Extension (NVTX) data.

Exporting [/opt/nvidia/nsight_systems/reports/openmpevtsum report3.sqlite] to console... SKIPPED: report3.sqlite does not contain OpenMP event data.

I’m not sure how to interpret this as overall GPU utilization.

Also, our application consists of multiple applications running at the same time. Is it possible to profile multiple applications (or processes) with nsys? I could not find how to do this in the manual.


Sorry that it should be:

$ nsys profile --stats=true

Please refer to Nsight Systems User Guide :: NVIDIA Nsight Systems Documentation in NVIDIA DRIVE DOCUMENTATION site.