Hi, I’m encountering an issue where using fragment shader interlock is almost like it doesn’t work on RTX GPU driver, while the same code with ARB_fragment_shader_interlock enabled works perfectly on GTX GPU drivers and Intel drivers.
The problem is flickering black texels running around at the bottom right of the screen, that’s where I use shader interlock to do limited programmable blending, but it doesn’t seems to function properly on newest RTX GPU driver, while on the given driver on 1050ti below, and intel drivers, it seems to work properly.
Is there any updates on this? Do you guys track the issue internally? Thanks.
I think D3D12 ROV is broken too on Turing (maybe related), I will confirm it with other dev later.
Hi, to future ones. We did not put a memory barrier before each draw call, so the new draw call doesnt wait for last draw call to finish storing to image, causing race condition.
Before each draw call, we put it like this
glMemoryBarrier(GL_SHADER_IMAGE_ACCESS_BARRIER_BIT | GL_TEXTURE_FETCH_BARRIER_BIT);
Thanks for the update, glad the added barrier sorted out the issue.
For what it’s worth, here’s the relevant sections from the 4.5 spec (7.12.2) outlining why this is necessary:
Explicit synchronization is required to ensure that the effects of buffer and texture data stores performed by shaders will be visible to subsequent operations using the same objects and will not overwrite data still to be read by previously requested operations. Without manual synchronization, shader stores for a “new” primitive may complete before processing of an “old” primitive completes. Additionally, stores for an “old” primitive might not be completed before processing of a “new”primitive starts.
And then a few pages down in the glMemoryBarrier guidelines:
Data written to image variables in one rendering pass and read by the shader in a later pass need not use coherent variables or memoryBarrier. Calling MemoryBarrier with the SHADER_IMAGE_ACCESS_BARRIER_BIT set in barriers between passes is necessary.
Hi, it’s 2022 and I’m still having the same problem, but with a slightly different situation. I’m using instanced rendering, so how do I guarantee a barrier between instances within a single drawing command?
And, my code (without memory barriers) works fine on GTX1060 but can’t get correct results on my RTX2060.
Looking forward to your reply!