How to Display Source Code Alongside SASS in Nsight Compute for Custom CUDA Kernels called from Pytorch/Python Autograd?

Hello, I would really appreciate any help or shared experiences with profiling custom CUDA kernels integrated with PyTorch’s autograd! I’m working on a project where these kernels are compiled and used with PyBind11 and setup.py to integrate with PyTorch’s autograd for training. These kernels are called from train.py during the loss.backward() operation, which launches the backward pass and gradient calculation. For compilation, I use the -g, -lineinfo flags too. However, when profiling these CUDA kernels with Nsight Compute, I can only see the SASS and not the CUDA source code displayed alongside it. I have run Nsight Compute with the “–import-source yes” option too. Has anyone successfully configured Nsight Compute to display the CUDA source code alongside SASS for such kernels compiled? Any tips for verifying that the compiled binaries include the necessary source code information for Nsight Compute would be greatly appreciated.

I have a Windows environment, CUDA 12.3, Nsight Compute 2024.3, Pytorch 2.3.1+CU12.1.

Thank you in advance for your help!

Hi, @mahmoud52623

Do you use numba (or some other Python lib) to write the kernel?
Or implemented in C++ directly (sounds like it)?
In case of the latter, if the C++ sources are correctly built with -lineinfo, we expect the SASS view to be correlated to a C++ file rather than a Python file.
So we are a little confused about your scenario.

If possible, please share your repro and then we can see how to help.