I would like to profile nccl kernel and get some detail metrics by using nsight compute, but it always hang. Can anybody give me some information about this? Thanks.
Add more details:
tested on NCG container: nvcr.io/nvidia/pytorch:21.07-py3
PS: There has been an same issue reported in github , but no conclusion yet.