Profiling communication for DGX2

Hi:

I am trying to profile communication among the GPUs in a DGX2 using different deep learning algorithms. I am planning to use nvprof tool for this. Does nvprof have the support to profile NCCL collective communication? I could not find the information when I went through the features nvprof collected

nvprof nvidia-smi can report traffic by/per NVLink link. Those would be the links that NCCL uses for communication.

May I ask what command of nvprof should I use?
I make sure there are many traffic on NVLINK by nvidia-smi, but I didn’t see any report of NVLINK in the output of nvprof(opened by NVVP). Will the traffic be shown in the timeline of NVVP?

sorry, my previous post was not what I intended. I had intended to say nvidia-smi

However, the visual profiler has a nvlink view:

[url]https://docs.nvidia.com/cuda/profiler-users-guide/index.html#nvlink[/url]

and there are metrics that can be gathered by nvprof (or nvvp):

[url]https://docs.nvidia.com/cuda/profiler-users-guide/index.html#metrics-reference-6x[/url]

Is it possible to watch the behavior of NVLINK in NVVP? I’m profiling a deep learning model, and I expect there is a lot traffic on NVLINK because of the ncclAllReduce. However I cannot see any traffic on NVLINK from the timeline of NVVP, and the NVLINK view is almost empty.

I don’t know of any way to watch the behavior of NVLINK with the profilers. They capture their data, presumably for an entire application run, then you get to view or inspect it.

The nvidia-smi tool or the underlying NVML library could be used in such a way to get periodic updates on NVLINK traffic. I think scripting to do this with nvidia-smi should be fairly obvious, however I wouldn’t be able to put together a script for you. If you decided to do it with NVML library, then you would need to create a program to do that. I don’t have any suggestions about getting started there, other than the NVML SDK, which includes example codes.