Hello, I would really appreciate any help or shared experiences with profiling custom CUDA kernels integrated with PyTorch’s autograd! I’m working on a project where these kernels are compiled and used with PyBind11 and setup.py to integrate with PyTorch’s autograd for training. These kernels are called from train.py during the loss.backward() operation, which launches the backward pass and gradient calculation. For compilation, I use the -g, -lineinfo flags too. However, when profiling these CUDA kernels with Nsight Compute, I can only see the SASS and not the CUDA source code displayed alongside it. I have run Nsight Compute with the “–import-source yes” option too. Has anyone successfully configured Nsight Compute to display the CUDA source code alongside SASS for such kernels compiled? Any tips for verifying that the compiled binaries include the necessary source code information for Nsight Compute would be greatly appreciated.
I have a Windows environment, CUDA 12.3, Nsight Compute 2024.3, Pytorch 2.3.1+CU12.1.
Thank you in advance for your help!