Input Assembly 100% bottleneck but 0% utilization

I’ rendering a mesh as GL_TRIANGLES, heavily tesselate it, let the tesselation evaluation shader output points (no other attributes besides gl_position is computed), and render these points with size 1 to a depth buffer. no additional vertex attributes, no additional render targets.

nsight tells me, my draw calls are 70-100% bottlenecked by “Input Assembly”, followed by 30-60% “Frame Buffer” and 10% Shader. BUT utilization is something like 30% Frame Buffer (ok so far), 100% Shader, zero percent Input Assembly and zero percent Tesselator.

Nsight 5.1 RC, 364.72, 750 Ti, using OpenGL 4.3 features.

so my questions are:

  1. why are the numbers so weird? e.g. high bottleneck % vs 0% utilization, and the tesselator not showing up. fwiw i didn’t find any raw counter related to tesselation, “Tessellator SOL” does not show up anywhere. edit: i’m using nsight 5.0 now, the tesselator SOL counter is back.

  2. although i render simple GL_TRIANGLES with no vertex attributes, IA is a bottleneck. as far as i understand, tesselation output goes through a input assembly step as well. is that true? if yes, could you help me understand why outputting points with no attribs besides positions creates any serious load on IA? And if yes, the “summary” figure is wrong (arrow from input assembly to shaders, but not back), please fix it :)

two slighty offtopic questions:
3. is there any comprehensive documentation on the performance counters? i couldn’t find any. The PerfKit user guide is only partly helpful and incomplete.
4. could you confirm that compute shaders are lacking support for the memory, bottleneck, utilization view etc. in the frame profiler? i only get the raw counters.

Hi karyon,

That’s possible, 100% bottleneck and 0% utilization, IIRC, I also meet this some time ago. You can image a highway, it’s very busy at slip road [you drive through slip road into the highway], but the highway itself is very empty, you can even fly in the high way. Now the high way’s bottleneck is 100%, but the utilization is 0%.

I have to say the metaphor is not precisely, but I hope it’s make thing clear.