Probing NVlinks in HGX2

ziyue.zhang · February 1, 2024, 10:00am

Dear,

We are running experiments on a HGX-2 (16 * V100 GPUs) system, only part of the whole machine (4 * V100 GPUs) are exposed via a docker container.
We did observe nvlink activities (send and receive bytes) by profiling applications with NCU, however, the “nvlink_table and nvlink_topology” sections both gave output “Not supported for NVSwitch.”
I believe that the V100 GPUs in the HGX2 are indeed connected via nvswitches, so do you know what does this mean?

Moreover, it seems that the NCU profilies for nvlink activities in a very high level, meaning that it samples every kernel, and also do not record the source and destination of packets, is that true?

Best regards,
Z.

veraj · February 1, 2024, 10:10am

Hi, @ziyue.zhang

The output means Nsight Compute do not support nvswitch related analysis. This is restriction of the tool now.

ziyue.zhang · February 1, 2024, 10:20am

Hi, Thanks for the reply.

Does this mean that nsight compute don’t have the ability to trace the exact source and destination of data stream among GPUs? If so, do you know any other tool that has this capability?

Thanks!

veraj · February 6, 2024, 11:55am

Yes. Your understanding is right. This is a known limitation of nvlink profiling currently.

For the question about other tool, I check internally, and was told that Nsight System also provide nvlink metrics, you can refer to User Guide — nsight-systems 2024.1 documentation, but it only supports on Turing+ GPU. And the metric is not per link, but all links on the gpu aggregated.

Topic		Replies	Views
How to measure nvlink/PCIe percentage GPU - Hardware cuda , ubuntu , nsight	0	83	August 6, 2024
Nsys does not display NVLINK counters on GH200 that is described in documentatation Profiling Linux Targets	7	387	June 5, 2025
Profiling communication for DGX2 CUDA Programming and Performance	5	989	July 8, 2019
Fail to find metric & No metrics to collect found in sections Nsight Compute	7	1412	November 27, 2023
How to observe the behavior of NVLINK by NVVP and nvprof? Visual Profiler and nvprof	1	1343	July 10, 2019
Meauring data sent/received on the nvlink CUPTI – CUDA Profiler Tools Interface	2	848	January 29, 2024
gathering cpu-to-gpu and gpu-to-gpu transfers at the same time CUPTI – CUDA Profiler Tools Interface	2	3289	October 12, 2021
How to gather metrics per nvlink ? CUPTI – CUDA Profiler Tools Interface	3	1407	October 12, 2021
DGX-1 NVlink Tx,Rx Throughput issues CUDA Programming and Performance	1	1185	August 7, 2024
NVIDIA Nsight System: How can I use NVIDIA Nsight System analysis my project? Profiling x86 Windows Targets cuda , ubuntu	0	333	July 28, 2024

Probing NVlinks in HGX2

Related topics