Using Mesh Shaders for Professional Graphics

Originally published at:

Mesh shaders were introduced with the Turing architecture and are shipping with Ampere as well. In this post, I offer a detailed look over mesh shader experiences for these hardware architectures so far. The context of these results was primarily CAD and DCC viewport or VR-centric. However, some of it may be applicable to games…

One more thing to add, while the blog shows mostly GLSL in the context of OpenGL and Vulkan, the tips do apply to DirectX 12 Ultimate as well. The main difference is that in DirectX one would use shared memory for primitive culling and then allocate the mesh via SetMeshOutputCounts and do the write out after that.

If you have any questions or comments, let us know.

Great work and very inspiring. I really enjoy whenever you come along with a new mesh shader post :-)

I have a couple of question on Table 2. Maybe I missed something while reading.

  1. How many triangles does the Lucy Modell have? I am little bit confused, because the site you link suggest that it is more like 28 million triangles. You can provide the exact triangle and vertex count, please?

  2. Can you also provide byte exact numbers for that table?

  3. Were those numbers achieved with the 64 vertices/84 triangles meshlet?

  4. To how many bits where the vertex position quantized. That text only states a few bits?

Thank you very much in advance!



hi @quirin.meyer1 Here are some statistics from a slightly different version of the model (there a slightly different variants given different vertex merging setups, and I didn’t remember exactly the one used in the article)

Size of vertex data: 224808928 (128 bit per vertex, 3 x fp32 pos, 2x unorm16 octant normal)
Size of index data: 336668864 (32 bit)
triangles total: 28055738

64 vertices 84 primitives meshlets

meshlets; ;
prim; <number of total triangles, removed degenerated>; ;
vertex; <number of vertices within meshlets, how much fetching/transforms etc. we will do>; ;

each meshlet always has a 128-bit header (stored in a dense array), and then a variable amount of raw data (primitive, vertex, indices etc.), whose offset is stored within the header. The raw data is typically aligned to 32-bit sometimes 64-bit as well to aid decoding logic and get appropopriate load instructions.

meshlet basic packing (1)
meshlets; 342266; prim; 28055736; 1.00; vertex; 19784789; 0.90; 168644 KB

meshlet with delta indices
meshlets; 334288; prim; 28055736; 1.00; vertex; 19718844; 0.92; 117053 KB

meshlet with quantized position (10-bit unorm components, packed in single 32-bit, relative to meshlet cluster’s fp32 bbox also stored within meshlet)
meshlets; 334288; prim; 28055736; 1.00; vertex; 19718844; 0.92; 165128 KB

meshlet with quantized position and delta indices for other attributes
meshlets; 334288; prim; 28055736; 1.00; vertex; 19718844; 0.92; 243869 KB

(1) gl_vk_meshlet_cadscene/nvmeshlet_packbasic.hpp at master · nvpro-samples/gl_vk_meshlet_cadscene · GitHub