How to use NVTX properly to efficiently range functions?

Actually, this should be a question about NVTX rather than about Nsight Systems. As I didn’t find a topic on NVTX on this forum, I am categorizing this question to Nsight Systems, which is my current use case.

It is my understanding that there are two sets of APIs, one for C++ only (#include <nvtx3/nvtx3.hpp>) and one for C/C++ (#include <nvtx3/nvToolsExt.h>) in using NVTX. It is clear to me that we can use NVTX3_FUNC_RANGE() to specify the whole range of a function. In my use case, I have many different OPs, each possessing a unique Compute function for its computation. The problem with this range specification is clear: I cannot differentiate Compute functions between different OPs from Nsight profiling.

To target the problem, I thought about using nvtxRangePushA and nvtxRangePop to provide names of my own to these Compute functions. Then I had a different issue: a function may have conditional return, and without adding nvtxRangePop to each conditional return, a range is not recorded correctly.

Is there an easy way to (1) provide user-defined text to each function to be profiled (2) ensure range terminates as a function returns?

@jasoncohen to respond