I’ve encountered what appears to be a performance pessimization after using the same attachment as both a depth/stencil (read-only) attachment and an input attachment (for decal rendering).
All later passes, which use the same depth/stencil attachment, have EarlyZ disabled automatically (observed in Nsight Graphics). But at the same time they can have EarlyZ explicitly enabled by adding
layout(early_fragment_tests) in; to their fragment shaders, which would suggest that this is not a hardware limitation.
Here’s the setup:
Aclears the depth/stencil attachment and draws with depth write and stencil write (layout =
Buses the same image as its depth/stencil attachment (depth test and stencil test, no writes) and an input attachment (layout =
Ccontinues to draw with depth write (layout =
If the input attachment is replaced with a sampled image (
texelFetch) then the issue goes away, which suggests that it’s limited to cases when the driver knows that the same image is being used both as a depth/stencil attachment and an input attachment. But this difference doesn’t make much sense because the layout already only allows read access.
I’ve tried both separate
VkRenderPasses and a single
VkRenderPass with multiple subpasses, and the result is the same.
The exact same behavior has also been observed when passes
B are combined into one subpass with a self-dependency between depth/stencil attachment writes in the first shader and input attachment reads in the second. The attachment layout in this case is
GENERAL, but non-attachment access isn’t allowed anyway. In this case I’ve tried to “hide” the
GENERAL layout as much as possible, only exposing it in
VkAttachmentReference2 - but that didn’t help either.
Has anyone encountered similar behavior?
For reference: I’m using an RTX 3090 with Game Ready Driver 546.33 on Windows 11