[Windows] Possible driver bug: Fragment shader interlock has almost no effect on RTX GPU driver

Hi, I’m encountering an issue where using fragment shader interlock is almost like it doesn’t work on RTX GPU driver, while the same code with ARB_fragment_shader_interlock enabled works perfectly on GTX GPU drivers and Intel drivers.

The problem is flickering black texels running around at the bottom right of the screen, that’s where I use shader interlock to do limited programmable blending, but it doesn’t seems to function properly on newest RTX GPU driver, while on the given driver on 1050ti below, and intel drivers, it seems to work properly.

Here is the gif of what happens (RTX 2070 SUPER driver 435.80)

And here is what I expected (running on a GTX 1050ti driver 430.86, GTX 1070 driver 435.27 also works):

Reproduce VS solution and executable (in x64 folder):

External issue:

I’m not sure if this is a regress or not, or only on RTX driver, or it’s our fault. Anyway, I would appreciate help, thanks!

Is there any updates on this? Do you guys track the issue internally? Thanks.
I think D3D12 ROV is broken too on Turing (maybe related), I will confirm it with other dev later.

Hi, 419.67 works for us (for RTX not super). For you guys to trace.

Hi, to future ones. We did not put a memory barrier before each draw call, so the new draw call doesnt wait for last draw call to finish storing to image, causing race condition.

Before each draw call, we put it like this

Thanks for the update, glad the added barrier sorted out the issue.

For what it’s worth, here’s the relevant sections from the 4.5 spec (7.12.2) outlining why this is necessary:

Explicit synchronization is required to ensure that the effects of buffer and texture data stores performed by shaders will be visible to subsequent operations using the same objects and will not overwrite data still to be read by previously requested operations. Without manual synchronization, shader stores for a “new” primitive may complete before processing of an “old” primitive completes. Additionally, stores for an “old” primitive might not be completed before processing of a “new”primitive starts.

And then a few pages down in the glMemoryBarrier guidelines:

Data written to image variables in one rendering pass and read by the shader in a later pass need not use coherent variables or memoryBarrier. Calling MemoryBarrier with the SHADER_IMAGE_ACCESS_BARRIER_BIT set in barriers between passes is necessary.