Scalarization of a fragment shader using subgroup intrinsics in GLSL possible?

WoodyPWX · June 18, 2020, 10:18am

Hello there,
I’m trying to optimize rendering on a Tegra X1 GPU. Our lighting is based on standard Forward+ approach. Visible lights are culled per 8x8 pixels tile using a compute shader. Later, during rendering of an object, I can fetch a bitmask of visible lights (lightIndices) for any rendered pixel and light it accordingly. Shader compiler static analysis reports 5 divergent branches when lighting code is present. If I replace fetching of lightIndices from a texture by assigning a uniform value from a constant buffer, number of divergent branches goes down to 0 and some other values, like latency and throughput limiters are improved as well. And because on other platforms I was able to reduce VGPRs usage by a simple scalarization of lighting, I used similar approach here as well.
I’ve enabled GL_KHR_shader_subgroup extension and used subgroupOr(lightIndices) or even subgroupBroadcastFirst(subgroupOr(lightIndices)) to get the same lightIndices bit mask for the whole warp/subgroup and expected GLSL shader compiler to treat this value as a uniform, reducing divergency. Unfortunately nothing like that happened and there is no measurable improvement in a GPU capture. Why is that so?
Regards,
Tomas

Topic		Replies	Views
Stumped on GLSL error C5041 (cannot located suitable resources to bind variable... possibly large array) OpenGL	1	1199	November 26, 2019
Poor OpenGL rendering : software mode ? OpenGL	6	3988	December 21, 2012
GL_ARB_gpu_shader_int64 compiler breaks code logic. Linux	0	744	November 23, 2017
Suggestion for a new OpenGL extension ? OpenGL	0	836	September 22, 2014
GLSL scatter without CUDA? CUDA Programming and Performance	1	7177	June 25, 2007
OpenGL Compute Shader unusually slow OpenGL	3	1675	July 11, 2022
Vertex shader requires extra uniform registers when a sampler is used OpenGL	0	1368	November 15, 2018
Powerful Shader Insights: Using Shader Debug Info with NVIDIA Nsight Graphics Technical Blog	1	275	March 14, 2024
CUDA graphics interop vs GLSL CUDA Programming and Performance	2	2517	June 10, 2017
glGetUniformIndices error on update drivers OpenGL	1	1168	July 29, 2013

Scalarization of a fragment shader using subgroup intrinsics in GLSL possible?

Related topics