Hello,
I have the following situation.
I dispatch a compute shader for an 2423 (width) x 1310(height) image.
If I do
glDispatchCompute((GLuint)std::ceilf(width / 39.0f), (GLuint)std::ceilf(height / 39.0f), 1);
and in the shader
layout(local_size_x = 39, local_size_y = 39) in;
everything works as expected everywhere (different workstations with different quadro cards).
If I do
const GLuint numGroupsX = static_cast<GLuint>(std::ceilf(width / static_cast<float>(39)));
const GLuint numGroupsY = static_cast<GLuint>(std::ceilf(height / static_cast<float>(39)));
glDispatchComputeGroupSizeARB(numGroupsX, numGroupsY, 1, 39, 39, 1);
and in the shader
layout( local_size_variable ) in;
the application hangs (therefore no GL error or debug outout) in the next OpenGL Command which is glDeleteTextures but only with the Quadro 4000 and more then 32 x 32 groups, with a Quadro K2000 everything is fine (39x39 or 32x32). I use Windows7 Ultimate 64 bit and the 377.35 Nvidia driver.
For the Quadro 4000:
GL_MAX_COMPUTE_VARIABLE_GROUP_SIZE_ARB
returns 1536,1024,64 for x,y,z
GL_MAX_COMPUTE_VARIABLE_GROUP_INVOCATIONS_ARB
returns 1536
GL_ARB_compute_variable_group_size
extension is available
If I use only 32 x 32 groups everything is okay (33 x 33 hangs again).
So to me it looks like either GL_MAX_COMPUTE_VARIABLE_GROUP_INVOCATIONS_ARB returns the wrong value (maybe it should only be 1024) or the implementation of glDispatchComputeGroupSizeARB has a bug.
Or maybe I oversaw something?
Every hint is appreciated.
Thanks
Marc