call stack on the cpu side for a kernel


I’ve played with cuda-gdb and nvprof tools for some time, one thing I could never retrieve is the call stack on the cpu side for a particular kernel. When in cuda-gdb, backtrace gets you the call stack on the GPU side but now stack information on the CPU side before the launch of that kernel. I know it’s due to the asynchronous execution of CPU and GPU, but is there a way to know where a kernel is called on the CPU side?? The cuda profilers like nvprof does not provide such information, either, which is common for a pure CPU program.

What is the major limit to provide such information here? Any ideas of how we get it?
Thanks a lot.

Hi, hzhang86

Please check if you can get what you need when using “nvprof --cpu-profiling on”

You can also use nvvp and enable “Profile execution on CPU”

Hello, Veraj

I’ve tried both, but neither of them gave me what I want. The information of CPU side and GPU side is divided. I want to see an inclusive profiling result, meaning at certain hot spot in the code, I want to know the complete calling context of it, from the CPU call stack to GPU’s.

Here, I just want to know more about how the kernel ID value is assigned in cuda-gdb, is it a monotonically increasing integer for each kernel call? e.g., if we have the following code:

void my_kernel1(){}

void my_kernel2(){}

int main () {
for (int i=0; i<3; i++)

return 0;

Are the kernel IDs of 4 kernel calls (one for my_kernel1 and three for my_kernel2) predetermined as 0,1,2,3 ? Once a kernel call is finished, will its ID be reused for future kernel calls?


I have posted your question to our dev.

I will reply once I get any response.