I’m trying to monitor the GPU utilization of dGPU on the AGX Pegasus platform. However, I’m having trouble finding a way to do this. I can monitor the iGPU using tegrastat utility, however according to Can't detect the dGPU utilization through tegrastat, it appears that tegrastats is not supported (and will not be supported) for the dGPU.
It mentions to check nvprof or Nsight for dGPU utilization. I tried using nvprof and Nsight Compute and there are utilization stats for individual components of the GPU when running a process, but no overall GPU utilization.
I heard (or read somewhere) that maybe Nsight Systems CLI would provide overall GPU utilization. However, I’m not able to find a version of Nsight Systems CLI that works with AGX platform. I tried using the Nsight Systems GUI, but it does not seem to provide this statistic.
Any help would be appreciated. Thanks!
Please provide the following info (check/uncheck the boxes after clicking “+ Create Topic”): Software Version
DRIVE OS Linux 5.2.0
DRIVE OS Linux 5.2.0 and DriveWorks 3.5
NVIDIA DRIVE™ Software 10.0 (Linux)
NVIDIA DRIVE™ Software 9.0 (Linux)
other DRIVE OS version
other
It is for DRIVE AGX Pegasus devkit. I corrected the description in the original post.
I can install Nsight System GUI from DRIVE OS Linux 5.2.0 and run it from the Linux host. However, I could not find the overall GPU utilization of the dGPU in the GUI. If it does has this capability, is there some documentation on how to get this info? I could not find a CLI version of Nsight Systems. Not sure if it’s true, but I’ve heard from somewhere else, that the Nsight System CLI does have overall GPU utilization.
On the host, I can see the documentation in /opt/nvidia/nsight-systems/2020.3.3/documentation, but most of the files are empty (just a cover page). I found a reference to the docs at Nsight Systems Documentation though. Not sure this applies to this version of Nsight Systems.
On the target, I can run /opt/nvidia/nsight_systems/nsys. But I still don’t see anywhere which will tell me the overall GPU utilization. Any ideas about this? I’ll keep looking in the mean time. Thanks.
I couldn’t find anywhere that could tell me the overall GPU utilization on AGX Pegasus. Just wanted to confirm with you that there is no way to monitor the overall GPU utilization on Pegasus right now. Thanks.
I took a look at CUPTI and it looks like the metrics are the same (or similar) as the metrics from Nsight Compute. Is my understanding correct? I was able to run Nsight Compute on the processes running in our system.
With Nsight Compute, I can capture the GPU Speed Of Light, Compute Workload Analysis and Memory Workload Analysis sections. However, the issue I have is that we have multiple processes using the GPU and I do not know how to combine these metrics from multiple processes into an overall GPU utilization.
Ideally, I’d like the same information that is provided by nvidia-smi. Example:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.102.04 Driver Version: 450.102.04 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:01:00.0 Off | N/A |
| 30% 30C P8 541W / 250W | 93MiB / 11019MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2066 G /usr/lib/xorg/Xorg 91MiB |
+-----------------------------------------------------------------------------+
Notice the GPU utilization, memory usage, etc. I understand that this tool is for PC and uses the NVML library to get this information. However, that the NVML library is not supported on ARM (Pegasus platform).
Any ideas how I can get this same information on Pegasus? For example, how I can use the metrics from CUPTI or Nsight Compute to understand the overall GPU utilization?
Any more ideas about this? Please let me know either way whether it is possible using the current libraries and tools to measure the overall GPU utilization on AGX Pegasus. This is critical to us for our current project to understand how many loads we can run on the current platform.
So far I’ve tried tegrastats, Nsight Compute and Nsight Systems and I haven’t been able find a way to measure the overall GPU utilization. For tegrastats, the utilization always shows 0%. For Nsight Compute, it appears that only a single process can be profiled at a time (according to the documentation, multiple processes are serialized) and for Nsight Systems, it appears that the GPU metrics on not enabled on this version.
Thanks for your help and any ideas on how we can move forward.
Currently we don’t have a tegrastats-like tool to monitor dGPU usage.
The nsys CLI tool supports an option ( --stats) that prints summary of CUDA kernel and memory operation statistics on the target.
$nsys --stats=true profile
This summary is closely related to GPU utilization. Alternatively the qdrep file obtained after profiling can also be used to obtain the stats using
$nsys stats
I could not run “nsys --stats=true profile ” as specified in the documentation. I had to run “nsys profile ” and then “nsys stats report.qdrep” after. Another limitation it seems is that the --gpu-metrics* options does not exist in the DRIVE OS version of nsys.
I tried running the “nsys profile” and “nsys stats” on the sample CUDA application matrixMul and this is the output I get:
Generate SQLite file report3.sqlite from report3.qdrep
Exporting 11491 events: [=================================================100%]
Using report3.sqlite file for stats and reports.
Exporting [/opt/nvidia/nsight_systems/reports/cudaapisum report3.sqlite] to console...
Time(%) Total Time (ns) Num Calls Average Minimum Maximum Name
------- --------------- --------- ----------- -------- --------- ---------------------
93.9 662003488 3 220667829.3 17152 661923904 cudaMalloc
2.8 19747360 301 65605.8 23520 290688 cudaLaunchKernel
2.7 18871840 1 18871840.0 18871840 18871840 cudaEventSynchronize
0.5 3314240 3 1104746.7 160352 2887584 cudaMemcpy
0.1 724224 3 241408.0 30688 613088 cudaFree
0.0 31136 2 15568.0 15424 15712 cudaEventRecord
0.0 19264 1 19264.0 19264 19264 cudaDeviceSynchronize
0.0 15968 2 7984.0 5408 10560 cudaEventCreate
Exporting [/opt/nvidia/nsight_systems/reports/gpukernsum report3.sqlite] to console...
Time(%) Total Time (ns) Instances Average Minimum Maximum Name
------- --------------- --------- -------- ------- ------- --------------------------------------------------------
100.0 39234208 301 130346.2 130048 130944 void MatrixMulCUDA<32>(float*, float*, float*, int, int)
Exporting [/opt/nvidia/nsight_systems/reports/gpumemtimesum report3.sqlite] to console...
Time(%) Total Time (ns) Operations Average Minimum Maximum Operation
------- --------------- ---------- ------- ------- ------- ------------------
60.8 72640 2 36320.0 25120 47520 [CUDA memcpy HtoD]
39.2 46880 1 46880.0 46880 46880 [CUDA memcpy DtoH]
Exporting [/opt/nvidia/nsight_systems/reports/gpumemsizesum report3.sqlite] to console...
Total Operations Average Minimum Maximum Operation
-------- ---------- ------- ------- ------- ------------------
800.000 1 800.000 800.000 800.000 [CUDA memcpy DtoH]
1200.000 2 600.000 400.000 800.000 [CUDA memcpy HtoD]
Exporting [/opt/nvidia/nsight_systems/reports/osrtsum report3.sqlite] to console...
Time(%) Total Time (ns) Num Calls Average Minimum Maximum Name
------- --------------- --------- ---------- -------- --------- ----------------------
55.8 778706336 8 97338292.0 78046304 100133120 sem_timedwait
26.1 363706496 10 36370649.6 43456 128691872 poll
17.4 242384800 454 533887.2 1408 23112736 ioctl
0.4 5094112 416 12245.5 992 644704 sched_yield
0.1 1799968 60 29999.5 7040 97728 mmap
0.1 1452000 9 161333.3 72640 420192 sem_wait
0.0 535648 23 23289.0 4416 54304 open
0.0 506272 24 21094.7 3584 66368 fopen
0.0 459104 2 229552.0 147936 311168 pthread_create
0.0 417312 10 41731.2 6944 87520 write
0.0 286976 13 22075.1 2976 161568 read
0.0 180096 3 60032.0 27168 76608 fgets
0.0 85376 1 85376.0 85376 85376 open64
0.0 81376 25 3255.0 1248 4864 fcntl
0.0 70944 3 23648.0 20384 25568 pipe2
0.0 35232 1 35232.0 35232 35232 connect
0.0 32512 2 16256.0 9024 23488 munmap
0.0 26976 1 26976.0 26976 26976 socket
0.0 23680 3 7893.3 4960 10688 fclose
0.0 1248 1 1248.0 1248 1248 pthread_cond_broadcast
Exporting [/opt/nvidia/nsight_systems/reports/nvtxppsum report3.sqlite] to console... SKIPPED: report3.sqlite does not contain NV Tools Extension (NVTX) data.
Exporting [/opt/nvidia/nsight_systems/reports/openmpevtsum report3.sqlite] to console... SKIPPED: report3.sqlite does not contain OpenMP event data.
I’m not sure how to interpret this as overall GPU utilization.
Also, our application consists of multiple applications running at the same time. Is it possible to profile multiple applications (or processes) with nsys? I could not find how to do this in the manual.