Anyhit payload lost when calling “AcceptHitAndEndSearch()” or "IgnoreHit()"

natemorrical · February 14, 2023, 10:43pm

Hello,

I’m running into a bug with Vulkan raytracing on my RTX 4090 (using HLSL as my shading language, and compiling to SPIRV)

Whenever I call AcceptHitAndEndSearch() or IgnoreHit() in an anyhit program, all previous writes to the ray payload are lost.

This is an issue, since there are times where it is important to be able to terminate traversal early in an anyhit program, or to process the hit primitive without decreasing the ray T current value.

For example, I might be using an order-independent transparency method, using an anyhit shader to combine color values up until a saturation point, at which point ray traversal should end.

Or, I might want to implement a range query, tracing a zero-length ray against a set of bounding boxes, and when overlap becomes severe, I need to be able to terminate traversal in an anyhit program, storing results in a ray payload as the ray progresses through primitives.

I have a reproducer of the bug here: GitHub - natevm/GPRT-AHBugRepro

A screen capture of the bug is here. When I check the radio button, AcceptHitAndEndSearch() is conditionally called. The bug is that the spheres go black, despite having written to the ray payload before terminating traversal.

belakampis1 · February 14, 2023, 11:38pm

I would like to add a +1 here to see if this behaviour can be changed / fixed (perhaps optionally, through a flag).

What my use case is for, is transparent objects using alpha blending, using path tracing and NRD denoiser. I was told by the NRD author(s) that it simply does not support stochastic transparency, and the workaround was to use rasterization in a separate pass for primary visibility transparencies only. But this is unacceptable, since I need the transparency to work in reflected / indirect rays as well.

So my solution I tried, to avoid having noisy translucencies (that NRD couldn’t denoise properly), was to store the transparency + albedo in a separate channel in the payload, and always call IgnoreHit no matter what the alpha was (other than 0), and store the alpha/base colour separately in the gbuffer, then denoise the opaque stuff, then blend it back in, during compositing pose-denoise, but pre-upscaling. DLSS would then take care of the rest of noise remaining.

I’m using Vulkan RT with HLSL shaders (compiled to SPIR-V through DXC.exe), rather than DX12/DXR, but I expect it has the same issues or bugs in either API.

natemorrical · February 15, 2023, 4:00am

I discovered that this bug with the AcceptHitAndExit and IgnoreHit calls has to do with these intrinsics being called by a nested function calls inside the anyhit entry point. If I call these intrinsics in the main body, these functions behave as expected.

belakampis1 · February 15, 2023, 11:44pm

I will test this myself shortly in my AnyHit and report back, but I trust it’s probably the same for me. However, my original bug wasn’t using any nested function calls, and I’ve been using HLSL → dxc → SPV pipeline for ~2 years. I admit I haven’t tested it recently so it may indeed be fixed (or could have just been a bug in my code, but I don’t think so since it’s dead simple).

MarkusHoHo · February 24, 2023, 12:40pm

Welcome to the NVIDIA developer forums @natemorrical !

Good to hear that you could figure out how to fix your issue!

natemorrical · February 24, 2023, 3:57pm

I found a work around, but this isn’t a solution. There does appear to be a bug in NVIDIA’s drivers here. If it’s part of the spec that I’m allowed to call IgnoreHit / AcceptHit from nested functions, that shouldn’t give me undefined behavior…

MarkusHoHo · February 27, 2023, 10:58am

Thanks for clarifying this!

I forwarded the information, let’s see if we can get some engineering response.

MarkusHoHo · February 27, 2023, 2:48pm

Good news! This is already being tracked as an internal bug.

As it is with these things I cannot make any statement on whether it can be fixed and how long it might take.

But I will try to keep track of it and update here when it has been resolved.

alelenv · April 19, 2023, 3:42am

Thanks natemorrical and belakampis1 for reporting the issue!

We believe this is a DXC codegen bug when translating from HLSL to SPIRV and not a bug in NVIDIA’s drivers.

You can find exact details of the issue here:

github.com/microsoft/DirectXShaderCompiler

[spirv] Inout semantics in presence of early thread termination do no match DXIL codegen

opened 03:38AM - 19 Apr 23 UTC

alelenv

For the following any-hit shader ``` struct Payload { float4 data; }; …struct HitAttr { float3 bary; }; void userFunc(inout Payload localPayload) { localPayload.data.x += 1.0f; AcceptHitAndEndSearch(); } [shader("anyhit")] void SphereAnyHit(inout Payload p, in HitAttr h) { userFunc(p); } ``` DXC generates the following DXIL ``` ; Function Attrs: noreturn nounwind` define void @"\01?SphereAnyHit@@YAXUPayload@@UHitAttr@@@Z"(%struct.Payload* noalias nocapture %p, %struct.HitAttr* nocapture readnone %h) #0 { %1 = getelementptr inbounds %struct.Payload, %struct.Payload* %p, i32 0, i32 0 %2 = load <4 x float>, <4 x float>* %1, align 4, !alias.scope !19 %3 = extractelement <4 x float> %2, i32 0 %4 = fadd fast float %3, 1.000000e+00 %5 = getelementptr inbounds %struct.Payload, %struct.Payload* %p, i32 0, i32 0, i32 0 store float %4, float* %5, align 4, !alias.scope !19 call void @dx.op.acceptHitAndEndSearch(i32 156) ; AcceptHitAndEndSearch() unreachable } ` ``` Here the payload passed in to 'userFunc' as an 'inout' argument is passed in by reference and hence mutated Note HLSL spec states https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-function-parameters That values are copied in and copied out. (I guess pass by reference is a valid implementation of the semantics) The code generated for SPIRV is ``` OpCapability RayTracingKHR OpExtension "SPV_KHR_ray_tracing" OpMemoryModel Logical GLSL450 OpEntryPoint AnyHitNV %SphereAnyHit "SphereAnyHit" OpSource HLSL 660 OpName %SphereAnyHit "SphereAnyHit" %void = OpTypeVoid %3 = OpTypeFunction %void %SphereAnyHit = OpFunction %void None %3 %4 = OpLabel OpTerminateRayKHR OpFunctionEnd ``` Note modification of payload is completely lost. This happens because for SPIRV codegen, 'inout' parameter semantics are implemented as pass by value-result, i.e they are copied in and then copied out (This can be seen by generating code at O0) However since 'AcceptHitAndEndSearch' is a thread terminating instruction. We never perform the copy-out This is technically a legal implementation of 'inout' Probably couple of questions arise out of this 1. HLSL folks, I guess this was specifically done for payloads, otherwise you can never write to a payload in any non-entry function Are there any other scenarios where DXC does something special? 2. DXC folks, if we were to make this work, we have to implement some form of unwind/cleanup for this special case? (Ugly way is to track using a boolean and unwind all the way to main and then write out to payload before exiting?)

In order to make this work, you will have to implement ‘unwind/cleanup’ semantics to copy out to payload manually in the shader as I have explained in the DXC bug above

Thanks

belakampis1 · September 29, 2023, 1:50am

Thanks, sorry I didn’t notice your reply, a workaround is totally fine. After reading all that, I still don’t really understand what the workaround is, exactly, though.

Can you walk me through it? (even just pseudo code or a starting point for a search)

natemorrical Can you explain what your own workaround was? I need a fix for this so I don’t have to fallback to doing it in inline raytracing mode instead (which is slower for my use case)

natemorrical · October 16, 2023, 4:41pm

Sorry for the slow response.

I also responded briefly over email to @belakampis1, but I figured I’d also give a public response for anyone who might come across this thread.

I wouldn’t be surprised to discover that this is another DXC bug. I’ve had luck in the past in adding a reproducer to Sascha Willem’s Vulkan SDK examples. I think part of the issue is that many folks just don’t understand parts of the ray tracing pipeline like the shader binding table, anyhit, custom intersection shaders, callable shaders, etc. So, an example in Sascha’s SDK can be both educational for the community and can also demonstrate that the equivalent code in GLSL works while HLSL/DXC doesn’t, which can help DXC devs reproduce the issue and put pressure on them to fix it.

My workaround at the moment is rather invasive, but I am currently wrapping all HLSL entry points with my own C style macro, which allows me to inject code at the beginning and end of each entry point body. The bug occurs when intrinsics are called inside of functions, so you need to call the intrinsics only from the main entrypoint body.

So, I currently create two static booleans right above the macro, and write my own namespaced versions of the intrinsics:

static bool _ignoreHit = false;
static bool _acceptHitAndEndSearch = false;
namespace gprt {
  void ignoreHit() {_ignoreHit = true; }
  void acceptHitAndEndSearch() { _acceptHitAndEndSearch = true; }
}

#define GPRT_ANY_HIT_PROGRAM(progName, RecordDecl, PayloadDecl, AttributeDecl)                                         \
  /* fwd decl for the kernel func to call */                                                                           \
  inline void progName(in RAW(TYPE_NAME_EXPAND) RecordDecl, inout RAW(TYPE_NAME_EXPAND) PayloadDecl,                   \
                       in RAW(TYPE_NAME_EXPAND) AttributeDecl);                                                        \
                                                                                                                       \
  [[vk::shader_record_ext]] ConstantBuffer<RAW(TYPE_EXPAND RecordDecl)> CAT(RAW(progName),                             \
                                                                            RAW(TYPE_EXPAND RecordDecl));              \
                                                                                                                       \
  [shader("anyhit")] void __anyhit__##progName(inout RAW(TYPE_NAME_EXPAND) PayloadDecl,                                \
                                               in RAW(TYPE_NAME_EXPAND) AttributeDecl) {                               \
    progName(CAT(RAW(progName), RAW(TYPE_EXPAND RecordDecl)), RAW(NAME_EXPAND PayloadDecl),                            \
             RAW(NAME_EXPAND AttributeDecl));                                                                          \
    if (_ignoreHit)                                                                                                    \
      IgnoreHit();                                                                                                     \
    if (_acceptHitAndEndSearch)                                                                                        \
      AcceptHitAndEndSearch();                                                                                         \
  }                                                                                                                    \
                                                                                                                       \
  /* now the actual device code that the user is writing: */                                                           \
  inline void progName(in RAW(TYPE_NAME_EXPAND) RecordDecl, inout RAW(TYPE_NAME_EXPAND) PayloadDecl,                   \
                       in RAW(TYPE_NAME_EXPAND) AttributeDecl) /* program args and body supplied by user ... */
#endif

I set these values to false when the entry point is called, then inject the user’s actual entrypoint code. If I determine that these virtual intrinsics have been set to true by the end of the entrypoint, then I call the real intrinsics at the end of the function.

Ideally I wouldn’t have to do this, since I wouldn’t be surprised if this completely destroys any ability to profile my kernels with debug information… With all the workarounds I have for HLSL, I’ve been starting to consider migrating to Slang instead…

belakampis1 · October 24, 2023, 10:47pm

Thanks so much! I think I get how the workaround works, I’ll try it in my code to see if it fixes my issue (I want to avoid introducing stochastic alpha channel sampling / noise into my final renders, i.e. never stop the hit in the anyhit shader, unless alpha = exactly 1). My only alternative is, I think, inline ray tracing and basically just disable the transparency flag entirely. But I have no idea yet how this would all fit in to Opacity Micro Maps, which I should probably implement at the same time as this workaround (or even beforehand).

I need to support N torch flame billboards at various distances, one in front of the other (ex long corridors / flames + smoke, need to accumulate properly).

Topic		Replies	Views
Crash in nvogl64.dll creating a vulkan ray-tracing pipeline with raygen shader using ray queries Vulkan	3	1465	January 15, 2022
nSight 4.1 - can't see HLSL shader code, only ASM Nsight Visual Studio Edition	36	10473	March 13, 2017
Introduction to Real-Time Ray Tracing with Vulkan Technical Blog	2	550	February 12, 2019
Introduction to NVIDIA RTX and DirectX Ray Tracing Technical Blog	16	557	December 31, 2018
Mesh artifacts when using anyhit for transparency Optix 7 OptiX	10	2005	June 14, 2022
Ray-Tracing Validation at the Driver Level Technical Blog	4	460	March 25, 2024
ID3DX11Scan issues DirectX, DXR, DirectCompute	2	1595	February 25, 2015
How to properly use visibility masks? OptiX	15	2597	June 14, 2022
Dont understand how to finish - DLI Course ‘Building RAG Agents for LLMs’ Base Command Manager	37	1542	January 27, 2025
What model to use for face recognition? DeepStream SDK	23	4269	June 19, 2022

Anyhit payload lost when calling “AcceptHitAndEndSearch()” or "IgnoreHit()"

Related topics