Question about profiling nccl kernels with Nsight Compute

Hi,
I would like to profile nccl kernel and get some detail metrics by using nsight compute, but it always hang. Can anybody give me some information about this? Thanks.

Add more details:
tested on NCG container: nvcr.io/nvidia/pytorch:21.07-py3
application: nccl-test/build/all_reduce_perf

PS: There has been an same issue reported in github , but no conclusion yet.

Nsight Compute serializes kernel launches across all profiled processes. If a kernel waits for other concurrent processes (or kernels) it will not be able to make forward progress and the profiling will hang. So such applications cannot be profiled using Nsight Compute.

Hi Sanjiv,
Thanks. And is there any plan to make ncu support ncck kernel profiling?

Yes, we are looking into supporting these types of applications in the future, but there is no definite timeline for such support to be released, yet.