Nsys does not display NVLINK counters on GH200 that is described in documentatation

Hi,

I am using nsys to profile a GH200 machine.

According to the user manual,, the NVLink performance data should be shown in the nsys report, but I do not see it in the GUI (image below), nor does nsys report any errors that it could not be collected.

Platform	Linux
OS	Rocky Linux 9.3 (Blue Onyx)
Hardware platform	armv8
Serial number	Local (CLI)
GPU descriptions	NVIDIA GH200 120GB
NVIDIA driver version	550.54.15
Max EMC frequency	1.60 GHz
CPU context switch	supported
GPU context switch	supported
Tunnel traffic through SSH	no
Timestamp counter	supported
NVIDIA Nsight Systems version 2024.1.1.59-241133802077

@pkovalenko

What does diagnostics summary say? Are there any messages mentioning GPU Metrics?

This is the only message regarding GPU metrics in the diagnostic summary

Information	Analysis		00:05.251	
Number of GPU Metrics events collected: 286,057.

Could you share the report? You can either upload it publicly, or send it to me privately to pkovalenko@nvidia.com, or, if you don’t want to share confidential data, collect and share a new report by profiling sleep 1. Note that 2024.1 is a fairly old release, so please try the latest one first.

I still have the same problem of NVLink metrics not visible on NSys. I am now running 2024.5. Attached the report with nsys profile --gpu-metrics-set=gh100 --gpu-metrics-devices=all --cuda-um-cpu-page-faults=true --cuda-um-gpu-page-faults=true --event-sample=system-wide sleep 1
report4.nsys-rep.zip (255.5 KB)

Thanks for your patience. Looking in the report, it seems that the NVLink metrics have been scheduled and collected, but for some reason are not getting displayed on the timeline. The latest nsys-ui shows your report exactly the same way which means the problem hasn’t been fixed. I’ll do more digging and see what’s going on.

OK, so I must admit I got confused as well and didn’t notice there’s a single GPU in your system. NVLink metrics are only available on multi-GPU systems. If your goal is to observe the traffic between GPU and CPU (which also goes through NVLink), you should be looking at CTC Throughput metrics which are available in the latest website release.

1 Like