Nsight compute failed to profile with nvtx ranges in pytorch

Hi, I am doing profiling openai trition kernel in PyTorch with the ncu command.
The functional code of the script is

torch.cuda.synchronize()
for _ in range(warm_up):
    bm()
torch.cuda.synchronize()
torch.cuda.cudart().cudaProfilerStart()
torch.cuda.nvtx.range_push('NCU')
for _ in range(repeat):
    bm()
torch.cuda.synchronize()
torch.cuda.nvtx.range_pop()
torch.cuda.cudart().cudaProfilerStop()

And the ncu command is

ncu --nvtx --nvtx-include "Range NCU" -k _fwd_kernel_token_softmax_0d1d2d3d45c67c  python benchmarks/ncu_subprocess.py 

I think the operations match the documents well, and the NVTX range works as shown in the nsys,

But the ncu could not filter by NVTX, so I wonder if anyone can help. I would appreciate it!

Besides, though the nvtx range did not work, the cudaProfilerStart/cudaProfilerStop could.

With ncu --nvtx --range-filter :[1]: -k _fwd_kernel_token_softmax_0d1d2d3d45c67c , the profiling is done as expected.

So I wonder if any thing goes wrong in the NVTX, which is much more used in my workload.

The syntax you used in incorrect, you can refer to the documentation for all details. There is no “Range” prefix, unless you would name you range “Range NCU”. Also, since you used a push/pop range, you need to suffix it with an / to distinguish it from start/stop ranges. A valid syntax would be --nvtx-include "NCU/"

Oh, it seems I made a dumb mistake. :-D

Thanks for the kind reply!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.