Hi, I am doing profiling openai trition kernel in PyTorch with the ncu command.
The functional code of the script is
torch.cuda.synchronize()
for _ in range(warm_up):
bm()
torch.cuda.synchronize()
torch.cuda.cudart().cudaProfilerStart()
torch.cuda.nvtx.range_push('NCU')
for _ in range(repeat):
bm()
torch.cuda.synchronize()
torch.cuda.nvtx.range_pop()
torch.cuda.cudart().cudaProfilerStop()
The syntax you used in incorrect, you can refer to the documentation for all details. There is no “Range” prefix, unless you would name you range “Range NCU”. Also, since you used a push/pop range, you need to suffix it with an / to distinguish it from start/stop ranges. A valid syntax would be --nvtx-include "NCU/"