Nsight compute failed to profile with nvtx ranges in pytorch

pannenets.f · September 5, 2023, 7:05am

Hi, I am doing profiling openai trition kernel in PyTorch with the ncu command.
The functional code of the script is

torch.cuda.synchronize()
for _ in range(warm_up):
    bm()
torch.cuda.synchronize()
torch.cuda.cudart().cudaProfilerStart()
torch.cuda.nvtx.range_push('NCU')
for _ in range(repeat):
    bm()
torch.cuda.synchronize()
torch.cuda.nvtx.range_pop()
torch.cuda.cudart().cudaProfilerStop()

And the ncu command is

ncu --nvtx --nvtx-include "Range NCU" -k _fwd_kernel_token_softmax_0d1d2d3d45c67c  python benchmarks/ncu_subprocess.py

I think the operations match the documents well, and the NVTX range works as shown in the nsys,

But the ncu could not filter by NVTX, so I wonder if anyone can help. I would appreciate it!

pannenets.f · September 5, 2023, 7:11am

Besides, though the nvtx range did not work, the cudaProfilerStart/cudaProfilerStop could.

With ncu --nvtx --range-filter :[1]: -k _fwd_kernel_token_softmax_0d1d2d3d45c67c , the profiling is done as expected.

So I wonder if any thing goes wrong in the NVTX, which is much more used in my workload.

felix_dt · September 5, 2023, 8:47am

The syntax you used in incorrect, you can refer to the documentation for all details. There is no “Range” prefix, unless you would name you range “Range NCU”. Also, since you used a push/pop range, you need to suffix it with an / to distinguish it from start/stop ranges. A valid syntax would be --nvtx-include "NCU/"

pannenets.f · September 5, 2023, 2:14pm

Oh, it seems I made a dumb mistake. :-D

Thanks for the kind reply!

system · September 19, 2023, 2:15pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Profiling fails on more than one gpu device Nsight Compute	9	1018	November 15, 2023
NSight Compute CUPTI_ERROR_MULTIPLE_SUBSCRIBERS_NOT_SUPPORTED Nsight Compute cudnn	5	1074	January 29, 2024
NSight Compute vs. NSight Systems vs. PyTorch Profiler Nsight Compute	2	3450	March 23, 2024
How to control profiling start time using Nsight System gui like --capture-range=cudaProfilerApi in cli Profiling Linux Targets nsight	12	4021	April 4, 2023
Nsight-compute print "the application returned an error code (249)" Nsight Compute	5	1477	February 13, 2023
NVTX display problem Profiling Linux Targets	6	1298	December 1, 2023
Nsight Computer with PyTorch Nsight Compute pytorch , deep-learning-profiler	0	1535	December 23, 2020
Range profiling: "No ranges were profiled." Nsight Compute	2	1912	August 7, 2024
Profiler stuck while profiling a range Nsight Compute	1	2237	November 20, 2023
Question about profiling nccl kernels with Nsight Compute Nsight Compute	20	5115	February 13, 2025

Nsight compute failed to profile with nvtx ranges in pytorch

Related topics