How can I know which code the GPU blank part is related to?

Hi experts at NVIDIA, I’m profiling a python program(it’s vllm) running LLM inference using GPU.

I notice the GPU has blank parts every inference step. So I want to find it out.

I set some annotations using nvtx, but still unable to locate a specific part.

Can nsight system used to trace python code so I can know which line of code is related to this GPU blank?

You probably want the python backtrace sampling, but I am going to recommend that you read the User Guide for our python features. User Guide — nsight-systems 2024.4 documentation (I promise, it is a direct link, our forum software just munges the names).

Thank you!

I used this feature, but my problem is still not solved.

By default it samples 1000 times a second.

But I find when the GPU is idle, the python backtrace is also blank. why is that?

I think you are CPU bound, but I think it is doing something other than CUDA work. If you expand all the CPU threads and see if there are any CUDA API calls in that range.

In the event pane, via expert systems view, can you run the Asynchronous memcpy with pageable memory rule? I see that you are using async memory functions, and this will help you confirm that that is working correctly.