NVTX Markup Problems with Nsight Graphics

I’m seeing problems with Nsight Graphics’ support for NVTX. Specifically:

  1. When NVTX ranges are used on multiple threads, the “Frame Debugger” seems to consider all NVTX ranges to be emitted on the draw thread and completely fouls up the nesting.
  2. When NVTX ranges are used on multiple threads, the “GPU Trace Profiler” doesn’t show any NVTX ranges, except those enclosing SwapBuffers(). And it puts the entire frame in those.
    Removing those enclosing ranges results in “no” NVTX ranges reported by the GPU Trace Profiler, even though there are nested ranges enclosing the draw commands.

QUESTION : How should NVTX markup be used with Nsight Graphics?

Why I ask…

In our multithreaded GL rendering application, I’ve switched all of our timing ranges over from using:

  • KHR_debug markup: glPushDebugGroup() + glPopDebugGroup()
    to using:
  • NVTX markup: nvtxRangePush() + nvtxRangePop()

This works flawlessly in Nsight Systems. Named ranges are usable and shown properly in all threads, post to the profiler considerably faster than KHR_debug markup, and can enclose the frame delimiter (SwapBuffers()). This works great!

However, this seems to totally break timing range recognition in Nsight Graphics, both in the “Frame Debugger” and in the “GPU Trace Profiler” tools.

Previously, I was using KHR_debug debug groups with Nsight Graphics (draw thread only, of course) with Nsight Graphics and that worked well. But for multiple reasons, I’d really like to kill that off and just use NVTX ranges always, for both Nsight Graphics and Nsight Systems. Is this possible?

Please tell me how I can successfully use NVTX ranges with Nsight Graphics, so I can standardize on that.

Thank you for using Nsight Graphics and sorry you ran into these issues with NVTX. I will contact the engineering team on your behalf and get back to you with a response.

1 Like

NVIDIA NVTX’s intended usage is to delimit ranges of CPU execution.

To delimit ranges of GPU execution, and to visualize the corresponding GPU timings, please use the graphics-API-specific perf markers described in this documentation. For OpenGL, glPushDebugGroup() + glPopDebugGroup() are the preferred delimiters.

Thanks @dwoods. However, that doesn’t really answer my question.

?: Does NVIDIA specifically recommend for or against using NVTX within Nsight Graphics to delimit ranges of CPU execution (e.g. ranges where GL commands are being queued)?

?: And if for, what is the recommended usage for proper function within Nsight Graphics?

For instance, should NVTX markup be disabled on all threads besides the draw submission thread when running under Nsight Graphics? Can NVTX be used for marking ranges at all, with glObjectLabel() (from KHR_debug) being used for GL resource naming? I was hoping you guys could tell me what the unstated limitations are rather than having to figure them out myself through trial-and-error. Because NVTX ranges across multiple threads, in a form that works flawlessly in Nsight Systems, doesn’t just work in Nsight Graphics.

In the “Performance Markers” URL you linked to:

it says:

I’m not seeing any limitations here, but I’m definitely experiencing some.

Thanks for looking at this!

Related to this, here are some findings that I have determined related to Nsight Graphics NVTX range support:

  1. If NVTX ranges are only used on the draw submission thread, the Scrubber in the Nsight Graphics Frame Debugger seems to display reasonably, with ranges properly nested and surrounding blocks of GL calls. However…
  2. If NVTX ranges are used across multiple application threads (including the draw submission thread), then the Scrubber display in the Nsight Graphics Frame Debugger is extremely confused, acting as if all NVTX range start/stop events had been emitted on the draw submission thread. This renders the Scrubber unusable, as it’s populated with a mish-mash of NVTX range start/stop events spanning multiple threads.

So to Nsight Graphics’ limitations w.r.t NVTX range support, this is definitely one of them.

For now, we forceably disable NVTX range markup emission on all threads besides the draw submission thread when running our applications within the Nsight Graphics Frame Debugger. For Nsight Systems, this force-disable is not necessary, and NVTX ranges are properly captured and displayed within the respective application thread’s timeline.