How to use Nsight compute "source counter" to capture the CUDA kernel code in PyTorch?

Hi, I want to use the “source counter” of nsight compute to see the execution of CUDA kernel and SASS instructions. I have rebuild pytorch with addition flags:

USE_CUDA=1 DEBUG_CUDA=1 python setup.py develop ===> Success build Pytorch

In CMakeLists.txt of Pytorch, DEBUG_CUDA enables the “–lineinfo”. But after I try again to profile a torch.mm or torch.softmax, it still can not see any CUDA C kernel correlation SASS in “source counter”. I wonder to know where is the problem or it is impossible to perform this ?

I also try to add “-lineinfo” in CUDA_NVCC_FLAGS, but still not work.

Resolved, I update to newest nsight compute, it works

Thanks for letting us know that the update helped.