MDI Draw Calls in Frame Debugger

I’m currently looking at some frames captured in Nsight Graphics 2019.3 with the Frame Debugger which make use of MDI draw calls (e.g. glMultiDrawElementsIndirect or glMultiDrawArraysIndirect).

I do see the MDI draw calls in the Events View, and these cite the correct drawcount (e.g. drawcount = 981). However, beneath each of these draw calls, Nsight Graphics has inserted drawCount “virtual” draw calls for this same MDI draw call, each a copy of the original, except with drawCount changed to 1 and the indirect offset bumped up to point to the next draw record, as if launching each draw in the multidraw separately. For instance, with drawcount = 981, there are 981 added MDI draw calls below the parent MDI draw call. See the example below.

A few questions:

  1. Why is Nsight Graphics doing this? It doesn't do for non-indirect multi-draw calls (e.g. glMultiDrawArrays).
  2. How do I hide the virtual copies in the Events View?

Even applying the “Type: Draw Calls” filter in the Events View doesn’t hide these virtual draw calls.

426  "void glMultiDrawElementsIndirect(GLenum mode = GL_TRIANGLE_STRIP, GLenum type = GL_UNSIGNED_SHORT, GLvoid* indirect = 0x0000000000000000, GLsizei drawcount = 981, GLsizei stride = 0)"
 427  // Beginning of multi draw
 428  "void glMultiDrawElementsIndirect(GLenum mode = GL_TRIANGLE_STRIP, GLenum type = GL_UNSIGNED_SHORT, GLvoid* indirect = 0x0000000000000000, GLsizei drawcount = 1, GLsizei stride = 0)"
 429  "void glMultiDrawElementsIndirect(GLenum mode = GL_TRIANGLE_STRIP, GLenum type = GL_UNSIGNED_SHORT, GLvoid* indirect = 0x0000000000000014, GLsizei drawcount = 1, GLsizei stride = 0)"
...   
1407  "void glMultiDrawElementsIndirect(GLenum mode = GL_TRIANGLE_STRIP, GLenum type = GL_UNSIGNED_SHORT, GLvoid* indirect = 0x0000000000004C7C, GLsizei drawcount = 1, GLsizei stride = 0)"
1408  "void glMultiDrawElementsIndirect(GLenum mode = GL_TRIANGLE_STRIP, GLenum type = GL_UNSIGNED_SHORT, GLvoid* indirect = 0x0000000000004C90, GLsizei drawcount = 1, GLsizei stride = 0)"
1409  // End of multi draw

Hi Dark_Photon,

Thanks for your feedback.

  1. You can inspect each sub-draw with the original multi-draw separated. Nsight Graphics should insert the separately sub-draws for ALL multi-draw, including glMultiDrawArrays. Is this not the case for you? Then this should be our bug.

  2. Unfortunately they can’t be collapsed in current version. We’ll fix this in future release.

Thanks for the replies, JCLiang.

No, it’s not. I just rechecked this on the latest release (2019.4), and calls to glMultiDrawArraysEXT() are not being expanded as you describe, whereas calls to glMultiDrawElementsIndirect() are.

Great! Thank you.

I just confirmed this is a bug that glMultiDrawArraysEXT doesn’t have the separate sub-draws, while glMultiDrawArrays has. I’ll log a case, thank you for identifying the bug for us:)

More on this issue…

I’ve noticed a pattern with how long Nsight Graphics (2019.4) takes to capture a single frame from our engine and display it in the Frame Debugger. Here are the times:

  • 22 seconds - With all 14 MDI draw calls disabled (skipped).
  • > 3 minutes - With these 14 MDI draw calls enabled.

I suspect that the additional frame capture time may be due to Nsight Graphics (needlessly in my case) expanding out these 14 simply MDI draw calls into the thousands of component virtual sub-draw calls for display purposes.

It would be ideal if the Frame Debugger had an option “Expand MultiDraw calls: Y/N” so that Nsight Graphics users could avoid this excess frame capture time (as well as the Events view clutter) when it’s not needed.

Incidentally, this is the same “slow capture” problem I noted in this post.

More on this issue…

I’ve noticed a pattern with how long Nsight Graphics (2019.4) takes to capture a single frame from our engine and display it in the Frame Debugger. Here are the times:

  • 22 seconds - With all 14 MDI draw calls disabled (skipped).
  • > 3 minutes - With these 14 MDI draw calls enabled.

I suspect that the additional frame capture time may be due to Nsight Graphics (needlessly in my case) expanding out these 14 simply MDI draw calls into the thousands of component virtual sub-draw calls for display purposes.

It would be ideal if the Frame Debugger had an option “Expand MultiDraw calls: Y/N” so that Nsight Graphics users could avoid this excess frame capture time (as well as the Events view clutter) when it’s not needed.

Incidentally, this is the same “slow capture” problem I noted in this post.

Hi Drak_Photon,

Is the MDI using bindless buffer? What the primitive/draw count scale of the MDI?

I can some tests but I tend to think using of bindless buffer is the root cause of slow capture.

Thanks for the follow-up, JCLiang.

Re Bindless Buffer access, I get the same Frame Capture times whether I use bindless buffer APIs or not: ~3 minutes 11 seconds, so I don’t think that’s the cause of the slow-down.

Re primitive/draw count scale, the 14 MDI draw calls I’m using for this test and the draw counts are as follows. Currently there are not many primitives/vertices per sub-draw (most < 50). **

Thanks for looking into this!

glMultiDrawElementsIndirect( mode = GL_TRIANGLES, type = GL_UNSIGNED_SHORT, indirect = 0x0, drawcount = 3268, stride = 0 )
glMultiDrawArraysIndirect  ( mode = GL_POINTS                             , indirect = 0x0, drawcount = 3268, stride = 0 )
glMultiDrawElementsIndirect( mode = GL_TRIANGLES, type = GL_UNSIGNED_SHORT, indirect = 0x0, drawcount = 1994, stride = 0 )
glMultiDrawArraysIndirect  ( mode = GL_POINTS                             , indirect = 0x0, drawcount = 1994, stride = 0 )
glMultiDrawElementsIndirect( mode = GL_TRIANGLES, type = GL_UNSIGNED_SHORT, indirect = 0x0, drawcount = 2100, stride = 0 )
glMultiDrawArraysIndirect  ( mode = GL_POINTS                             , indirect = 0x0, drawcount = 2100, stride = 0 )
glMultiDrawElementsIndirect( mode = GL_TRIANGLES, type = GL_UNSIGNED_SHORT, indirect = 0x0, drawcount = 4646, stride = 0 )
glMultiDrawArraysIndirect  ( mode = GL_POINTS                             , indirect = 0x0, drawcount = 4646, stride = 0 )
glMultiDrawElementsIndirect( mode = GL_TRIANGLES, type = GL_UNSIGNED_SHORT, indirect = 0x0, drawcount = 2175, stride = 0 )
glMultiDrawArraysIndirect  ( mode = GL_POINTS                             , indirect = 0x0, drawcount = 2175, stride = 0 )
glMultiDrawElementsIndirect( mode = GL_TRIANGLES, type = GL_UNSIGNED_SHORT, indirect = 0x0, drawcount = 2247, stride = 0 )
glMultiDrawArraysIndirect  ( mode = GL_POINTS                             , indirect = 0x0, drawcount = 2247, stride = 0 )
glMultiDrawElementsIndirect( mode = GL_TRIANGLES, type = GL_UNSIGNED_SHORT, indirect = 0x0, drawcount =  576, stride = 0 )
glMultiDrawArraysIndirect  ( mode = GL_POINTS                             , indirect = 0x0, drawcount =  576, stride = 0 )

** (I gather that GPU rendering isn’t very inefficient in this case. However, that’s a separate issue from why Nsight Graphics Frame Capture with these 14 draw calls enabled takes so long. FWIW, my frame times with this test case running outside of Nsight Graphics w/o VSync are 1.90 msec/frame)

Thanks for the extra information, it helps us on investigating the issue. I’ll file a case for the dev team.

Thank you!

I wanted to update that this facility has been significantly sped up and the fix will be available in our next release, 2019.6.

That’s great to hear! Thanks!

Thanks for adding this! The frame captures containing MDI calls are much easier to sift through now!

That’s great to hear that the frame captures containing MDI calls are much easier to sift through now. Thanks for sharing the news.