Tips for Chasing Down Severe IA Bottleneck

dbamerso · July 31, 2015, 3:47pm

We have a long draw call, ~130ms, that is showing up as 98% IA bottleneck in NSight. Does anyone have tips for how to dig in further and discover the actual problems? Perhaps there’s a good resource online?

Up to this point, we’ve done the following which pulled it down from ~165ms:
-Removed a bad use of instancing which was slowing things down a bit. It’s a straight Draw call now.
-Reduced our vertex data down to zero. We compute procedurally in the VS and set a NULL input layout.
-VS input is just the vertex ID.

For additional context, we’re taking a voxelization of space and throwing it at the GPU. For each voxel, we’re computing occupancy based on some texture data we have. If occupied, we’re pushing data out to up to 24 camera views via a geometry shader. So, the geometry shader can create 0 to 96 verts which is pretty fat but necessary for us at this point.

AYan · August 3, 2015, 6:45am

Hi dbamerso,

I am not sure, but you mean Nsight have a bug to show 98% IA bottleneck? Could you share us your sample and we can do some local repro and investigation?

Thanks
An

dbamerso · August 5, 2015, 4:10pm

After further experiments on our side, it doesn’t look like a bug per se. It appears that you pay a performance cost at the input assembler for all output from the geometry shader. That makes sense, but what we didn’t expect was that you pay for maxvertexcount vertices regardless of whether you actually append that many verts.

In our case, we had ~700K input points to the geometry shader with about 10M output based off some back of the envelope math. Reducing the number of verts we output via Append had no effect unless we also reduced maxvertexcount which was not possible.

Final conclusion is that geometry shaders aren’t good for this. We shifted to compute shaders and DrawInstancedIndirect. That ended up being almost two orders of magnitude faster.

dba

AYan · August 6, 2015, 7:19am

Hi dba,

Glad that works for you. It’s really hard to say maxwertexcount have some relationship with IA Bottleneck, since that focus on vertex attribute fetch. Maybe you just increase the load of someother stage of your pipeline and make IA Bottleneck decrease, hard to say why here, since not many information.

Glad that you find some way to solve your performance issue.

Thanks
An

Topic		Replies	Views
Input Assembly 100% bottleneck but 0% utilization Nsight Visual Studio Edition	1	893	July 15, 2016
Nsight graphics debugger bug: fatal error C9999: * exception during compilation * (solved) Nsight Visual Studio Edition	4	1188	September 23, 2018
Finding cause of "No Instruction" stall and optimizing for it Nsight Graphics	4	1654	December 29, 2022
Nsight Graphics Increasing Performance Nsight Graphics nsight , opengl	4	78	September 1, 2025
How to see vertex reuse(post transform cache, vertex cache) in nsight graphics Nsight Graphics	0	632	April 24, 2023
Nsight Compute + Optix 8 / Unsupported multi-level instancing detected for traversable handle OptiX	8	172	August 26, 2025
Nsight 3.2.1 very slow during Shader-Debugging Nsight Visual Studio Edition	6	2322	November 30, 2013
Instruction-Level Profiling of Graphics Shaders (answered) Nsight Visual Studio Edition	2	660	October 12, 2021
Nsight extremly slow with shader debugging Nsight Visual Studio Edition	1	1176	March 28, 2016
Identifying Shader Limiters with the Shader Profiler in NVIDIA Nsight Graphics Technical Blog	1	2608	April 26, 2022

Tips for Chasing Down Severe IA Bottleneck

Related topics