Tips for Chasing Down Severe IA Bottleneck

We have a long draw call, ~130ms, that is showing up as 98% IA bottleneck in NSight. Does anyone have tips for how to dig in further and discover the actual problems? Perhaps there’s a good resource online?

Up to this point, we’ve done the following which pulled it down from ~165ms:
-Removed a bad use of instancing which was slowing things down a bit. It’s a straight Draw call now.
-Reduced our vertex data down to zero. We compute procedurally in the VS and set a NULL input layout.
-VS input is just the vertex ID.

For additional context, we’re taking a voxelization of space and throwing it at the GPU. For each voxel, we’re computing occupancy based on some texture data we have. If occupied, we’re pushing data out to up to 24 camera views via a geometry shader. So, the geometry shader can create 0 to 96 verts which is pretty fat but necessary for us at this point.

Hi dbamerso,

I am not sure, but you mean Nsight have a bug to show 98% IA bottleneck? Could you share us your sample and we can do some local repro and investigation?


After further experiments on our side, it doesn’t look like a bug per se. It appears that you pay a performance cost at the input assembler for all output from the geometry shader. That makes sense, but what we didn’t expect was that you pay for maxvertexcount vertices regardless of whether you actually append that many verts.

In our case, we had ~700K input points to the geometry shader with about 10M output based off some back of the envelope math. Reducing the number of verts we output via Append had no effect unless we also reduced maxvertexcount which was not possible.

Final conclusion is that geometry shaders aren’t good for this. We shifted to compute shaders and DrawInstancedIndirect. That ended up being almost two orders of magnitude faster.


Hi dba,

Glad that works for you. It’s really hard to say maxwertexcount have some relationship with IA Bottleneck, since that focus on vertex attribute fetch. Maybe you just increase the load of someother stage of your pipeline and make IA Bottleneck decrease, hard to say why here, since not many information.

Glad that you find some way to solve your performance issue.