I managed to reproduce the following issue on GeForce GTX 760 with 399.24 driver and GeForce 950M with 416.34. The issue does not show up on GTX650 with 390.87 driver. All these GPUs return 1024 for GL_MAX_VERTEX_UNIFORM_VECTORS, so I hardcoded 1000 as vec4 array size in vertex shader.
Let’s try to compile and link the following shader program.
Vertex shader:
#version 330
uniform vec4 vs[1000];
uniform sampler2D tex;
void main()
{
gl_Position = vs[42];
//gl_Position += texture(tex, vs[0].xy); // (1)
}
Fragment shader:
#version 330
out vec4 outFragColor;
void main()
{
outFragColor = vec4(0.0);
}
Everything is OK while line (1) is commented and thus tex sampler is optimized out. But if we uncomment it, link fails with the following log:
-----------
Internal error: assembly compile error for vertex shader at offset 427:
-- error message --
line 16, column 9: error: invalid parameter array size
line 20, column 16: error: out of bounds array access
line 22, column 30: error: out of bounds array access
-- internal assembly text --
!!NVvp5.0
OPTION NV_internal;
OPTION NV_gpu_program_fp64;
OPTION NV_bindless_texture;
Error: could not link.
# cgc version 3.4.0001, build date Oct 10 2018
# command line args:
#vendor NVIDIA Corporation
#version 3.4.0.1 COP Build Date Oct 10 2018
#profile gp5vp
#program main
#semantic vs
#semantic tex
#var float4 gl_Position : $vout.POSITION : HPOS : -1 : 1
#var float4 vs[0] : : c[0], 1000 : -1 : 1
#var ulong tex : : c[1000] : -1 : 1
PARAM c[2002] = { program.local[0..2001] };
TEMP R0;
LONG TEMP D0;
TEMP T;
PK64.U D0.x, c[1000];
TEX.F R0, c[0], handle(D0.x), 2D;
ADD.F result.position, R0, c[42];
END
# 3 instructions, 1 R-regs, 1 D-regs
Here we see that the array takes registers 0…999, and the sampler takes register 1000. Elements above 1000 are not referenced anywhere except line
PARAM c[2002] = { program.local[0..2001] };
Further experiments with array size showed that 2002 is not a constant, but a doubled amount of registers required.
The documentation states that OpenGL implementations are allowed to reject shaders for implementation-dependent reasons.
So is there a workaround to use all available registers along with a sampler in a vertex shader?
If not, what might be the rationale behind this behavior? Obviously, this shader does not use any registers for temporary computation results. Is it a misoptimization in shader compiler?
PS. This post is a duplicate of my SO question.