Missing Activity: GPU Trace

I am using:

nvidia NSight Graphics 2019.2.1.0 on Ubuntu.

And the documentation mentions four activities:

  • Frame Debugger
  • Frame Profiler
  • C++ Capture
  • GPU Trace

However, only three of these options show up for me. The GPU Trace is missing.

Is GPU trace not supported for the linux build of NSight Graphics?

I am trying to identify where and why the CPU and GPU sometimes synchronize causing bad performance and spin locking CPU.

Is there any other tool I can use on linux to identify where these happen?

Hi Stolk,

Correct, GPU Trace is not supported on Linux currently. We will update the documentation to make this more clear!

As for other tools - Nsight Systems is a tool exactly designed for this! I definitely recommend checking it out.

https://developer.nvidia.com/nsight-systems

Cheers,
Seth

Thanks… but having a hard time interpreting the NSight Systems output.

So what does this graph tell me?
The GPU spends all its time clearing the framebuffer?
And the CPU spends its time swapping the buffers?


Hi Stolk,

Without having the full trace, it does look like you are indeed spending the vast majority of time presenting. ~4ms for a swap buffers is certainly more expensive than I would expect. Based on the collapsed CPU row above, it doesn’t look like the CPU is active during those times. It only spikes activity to submit draws for the next frame by the looks of it.

I would imagine your GPU queue is fairly empty as well? This looks somewhat like a synchronization issue to me. But again - i’m guessing at this point. I would probably start by looking at your spin lock and investigating that clear.

Also, i’m going to move this discussion over the the systems thread. :)

Thanks,
Seth

Thanks Seth,

So you mention “GPU queue.”

Is that the line called labeled “OpenGL GPU work IDC”?
Because in Timeline View I only see two top-level nodes:

  • CPU

  • Threads

and both have 4 entries.

The GPU Work line shows very similar blobs to the API line, and is often also labeled with glXSwapBuffers.

So, I removed my glReadPixels and off-screen renders to make the OpenGL usage as simple as possible, to see what a normal, non sync-affected run looks like.

Would I expect to see a glXSwapBuffers to show up prominently, as I assume that during that swap, the flushing and syncing normally happens, and thus will take the bulk of CPU cycles?

Thanks again,

Bram

Could you send/attach the .qdrep file so that we can see what all is in the run?

Hi,

Here is my .qdrep file:

https://stolk.org/tmp/bram.qdrep

What I am trying to achieve is: not have the glMapBufferRange() block the CPU.

I use a PBO to glReadPixels(), and thanks to the PBO, the glReadPixels is not blocking, but when I later want to get the actual values, the glMapBufferRange will block.

Were you able to open the qdrep file?
thanks!

Yes, I could, and I gave it to an expert to analyze and I have not heard back from him. Sorry, I should have pinged him before this.

Pinging him again.