Profiling communication for DGX2

malik747 · October 22, 2018, 5:54pm

Hi:

I am trying to profile communication among the GPUs in a DGX2 using different deep learning algorithms. I am planning to use nvprof tool for this. Does nvprof have the support to profile NCCL collective communication? I could not find the information when I went through the features nvprof collected

Robert_Crovella · October 22, 2018, 11:04pm

nvprof nvidia-smi can report traffic by/per NVLink link. Those would be the links that NCCL uses for communication.

Lennoxwu · July 8, 2019, 7:40pm

May I ask what command of nvprof should I use?
I make sure there are many traffic on NVLINK by nvidia-smi, but I didn’t see any report of NVLINK in the output of nvprof(opened by NVVP). Will the traffic be shown in the timeline of NVVP?

Robert_Crovella · July 8, 2019, 7:55pm

sorry, my previous post was not what I intended. I had intended to say nvidia-smi

However, the visual profiler has a nvlink view:

[url]https://docs.nvidia.com/cuda/profiler-users-guide/index.html#nvlink[/url]

and there are metrics that can be gathered by nvprof (or nvvp):

[url]https://docs.nvidia.com/cuda/profiler-users-guide/index.html#metrics-reference-6x[/url]

Lennoxwu · July 8, 2019, 8:09pm

Is it possible to watch the behavior of NVLINK in NVVP? I’m profiling a deep learning model, and I expect there is a lot traffic on NVLINK because of the ncclAllReduce. However I cannot see any traffic on NVLINK from the timeline of NVVP, and the NVLINK view is almost empty.

Robert_Crovella · July 8, 2019, 8:22pm

I don’t know of any way to watch the behavior of NVLINK with the profilers. They capture their data, presumably for an entire application run, then you get to view or inspect it.

The nvidia-smi tool or the underlying NVML library could be used in such a way to get periodic updates on NVLINK traffic. I think scripting to do this with nvidia-smi should be fairly obvious, however I wouldn’t be able to put together a script for you. If you decided to do it with NVML library, then you would need to create a program to do that. I don’t have any suggestions about getting started there, other than the NVML SDK, which includes example codes.

Topic		Replies	Views
How to observe the behavior of NVLINK by NVVP and nvprof? Visual Profiler and nvprof	1	1295	July 10, 2019
NVProf for NCCL program GPU-Accelerated Libraries nccl	2	996	May 28, 2021
GPU-GPU Communication with nvprof Visual Profiler and nvprof	4	1376	June 16, 2020
Profiling NCCL Deep Learning (Training & Inference)	0	528	October 22, 2018
Can nvprof profile inter-process peer to peer communication? Visual Profiler and nvprof	6	1068	April 16, 2020
gathering cpu-to-gpu and gpu-to-gpu transfers at the same time CUPTI – CUDA Profiler Tools Interface	2	3223	October 12, 2021
How to gather metrics per nvlink ? CUPTI – CUDA Profiler Tools Interface	3	1312	October 12, 2021
Profiling CUDA Programming and Performance	0	501	August 13, 2015
How to know GPU Device ID for NVlink profiling by NVVP and nvprof? Visual Profiler and nvprof	0	799	August 13, 2019
NCCL and D2D data moving across GPU devices CUDA Programming and Performance	0	1172	October 28, 2017

Profiling communication for DGX2

Related topics