I’ve been trying to implement GL_ARB_parallel_shader_compile (which is a very welcome extension!), but I ran into a couple of issues:
The spec states that the initial value of GL_MAX_SHADER_COMPILER_THREADS_ARB is 0xFFFFFFFF, which I confirmed is indeed the initial value using glGetIntegerv. However, program queries of GL_COMPLETION_STATUS_ARB always return GL_TRUE (implying the link happened synchronously) unless I explicitly call glMaxShaderCompilerThreadsARB().
This seems like a bug; an explicit call to set the thread count to 0xFFFFFFFF works to enable async linking, but it shouldn’t be necessary since that’s already the initial value.
Calling glGetShaderiv with GL_COMPLETION_STATUS_ARB always blocks until the compilation has finished. This is not too critical (since I’m more interested in the result of the link) but it does seem like a bug, which took me some time to figure out.
After starting an async link, glGetProgramiv with GL_COMPLETION_STATUS_ARB will never return GL_TRUE, no matter how long I wait. I do get the impression that it did most of the work asynchronously (given that a subsequent program status query usually completes quickly), but it means I have no way to find out when the work is completed. This is the most frustrating one since I can’t find a workaround.
Any feedback would be appreciated. Driver is 368.22 on GeForce GT 650M on Windows 10 x64. For testing I circumvented the driver cache by writing a unique string in a comment at the end of each shader.
I can confirm that the exact things you describe still happen on the latest drivers (376.33) on my GTX 770.
Parallel shader compilation never happens unless you explicitly call glMaxShaderCompilerThreadsARB() at least once.
glGetProgramiv() with GL_COMPLETION_STATUS_ARB never returns GL_TRUE, making it completely useless.
I ran all my tests compiling shaders and linking shader programs on a second shared OpenGL context, circumventing the shader cache by adding a random value to the calculations in my shader (but a comment would’ve been enough like you did).
Sorry for thread necro. But still having this issue in 2019 on Linux with drivers: 410.48 & 418.43 w/ a GTX 1080.
In particular for me the fact that glGetShaderiv blocks w/ GL_COMPLETION_STATUS_ARB is critical since I’m trying to do async shader compilation.
To reproduce, just call glGetShaderiv w/ GL_COMPLETION_STATUS_ARB immediately after any glCompileShader call. It will always return true which means its blocking. I’m setting my number of threads explicitly like suggested above.
Any chance we can get someone from Nvidia to look at this?
From my testing, issue 1 & 2 from the original post still exist (However, I disagree that ‘1’ is an issue). However Issue 3 is fixed. As the program does not immediately indicate it is done linking after execution.
Also, sorry that the shader isn’t super “heavy.” I’m a bit green to this stuff and not really sure how to bloat shader compilation times. I get the same results on a different program with non-trivial shader compilation times, but I can’t share that one.
–
Edit:
I added some ‘big’ shaders that clearly showcase the application halt with GL_COMPLETION_STATUS_KHR for shaders. Also, I have yet to be able to finish linking yet with the ‘big’ shaders. Waited at least 10 minutes. I ran htop and can see the multiple threads for linking. Going to let it run overnight to see if it finishes linking. However, I don’t think these shaders should take that long to link…
That is the expected output, which indicates that there is a bug.
The line that says “Error: Shader compilation finished immediately!” Indicates that the glGetShaderiv with GL_COMPLETION_STATUS_KHR is halting execution until shader compilation completes. You may think that it’s just that the shaders are trivial. But if you change line 74 / 76 to use the “big” shaders you can clearly see the application hang and then print that error message (Indicating it is hanging on the supposedly non-blocking call GL_COMPLETION_STATUS_KHR).
Also it appears your linking is completing immediately. I suspect this is the second time you ran this test program? Reminder that shader’s get cached, so you’ll need to either delete your shader cache, or (what I do) add a comment inside the shader to invalidate the hash check and force shader recompilation. You should see a “Program still linking!” on your first run that indicates glGetProgramiv with GL_COMPLETION_STATUS_KHR is behaving as expected (See expected output in Readme.md).
Sorry the test program isn’t more unit-test formatted. If you’d like for me to explain / discuss anything over some voicechat we can arrange that over PM’s.
What KHR_parallel_shader_compile makes “parallel” on the NVIDIA driver is glLinkProgram,not glCompileShader. glCompileShader is expected to be synchronous, and to take a very short amount of time.
This isn’t a bug.
I would agree that 1 is a driver bug, which we may want to fix, but 2 isn’t due to the above.
Does this help?
“”
Add to 7.13 “Shader, Program, and Program Pipeline Queries” under the
descriptions for “pname” for “GetShaderiv”,
If <pname> is COMPLETION_STATUS_KHR, TRUE is returned if the shader
compilation has completed, FALSE otherwise.
“”
The intent is that after you call glLinkProgram / glCompileShader, you can query in a non blocking manner. When that query returns that compilation is finished, then you can get the program / shader without blocking the OpenGL context. This is more important in the use case mentioned in spec, where you may want to compile a shader in a separate thread at runtime without breaking frame due to the compile / link.
The goal according to the spec should be that GetShaderiv and GetProgramiv work in the same non-halting manner with COMPLETION_STATUS_KHR.
Checking COMPLETION_STATUS_KHR isn’t blocking in our driver. But, since glCompileShader isn’t backgrounded, we will always return TRUE for it.
So this:
std::cout << “Error: Shader compilation finished immediately!” << std::endl;
isn’t a bug at all, it is expected behavior.
linking isn’t expected to always return true, and it’s not supposed to block either.
I’ve talked to Geoff and Timothy who authored the KHR / ARB versions on this extension. If glCompileShader doesn’t do the same automatic thread creation / backgrounding, then that’s a fault in the implementation of this extension.
I agree with your reasoning that glGetShaderiv w/ COMPLETION_STATUS_KHR isn’t bugged, but that just means that glCompileShader is bugged w.r.t. KHR_parallel_shader_compile.
I think put another way, what is the point of adding a callback that always returns true? How is the user supposed to utilize this callback in it’s current implementation that differs from what they could do before?
Another relevant section from the spec:
“”
where is the number of background threads. A of zero
specifies a request for no parallel compiling or linking and a of
0xFFFFFFFF requests an implementation-specific maximum.
An implementation may combine the maximum compiler thread request from
multiple contexts in a share group in an implementation-specific way.
“”
Indicates via the “parallel compiling or linking” that parallel compilation should be supported in an implementation of this extension.
(P.S. Sorry if I’m coming across as rude. I realize that for the most part parallel linking is the important part of this extension)
Edit:
Also want to add. In regards to talking to Geoff / Timothy, I asked him:
“Is it necessary to have multiple OpenGL contexts to use this extension? Or will multiple calls to glCompileShader spawn threads as needed?”
To which Geoff responded (Timothy gave a similar response):
“Nope, the driver will handle spawning threads to do the work in parallel.”