Anyhit payload lost when calling “AcceptHitAndEndSearch()” or "IgnoreHit()"

Hello,

I’m running into a bug with Vulkan raytracing on my RTX 4090 (using HLSL as my shading language, and compiling to SPIRV)

Whenever I call AcceptHitAndEndSearch() or IgnoreHit() in an anyhit program, all previous writes to the ray payload are lost.

This is an issue, since there are times where it is important to be able to terminate traversal early in an anyhit program, or to process the hit primitive without decreasing the ray T current value.

For example, I might be using an order-independent transparency method, using an anyhit shader to combine color values up until a saturation point, at which point ray traversal should end.

Or, I might want to implement a range query, tracing a zero-length ray against a set of bounding boxes, and when overlap becomes severe, I need to be able to terminate traversal in an anyhit program, storing results in a ray payload as the ray progresses through primitives.

I have a reproducer of the bug here: GitHub - natevm/GPRT-AHBugRepro

A screen capture of the bug is here. When I check the radio button, AcceptHitAndEndSearch() is conditionally called. The bug is that the spheres go black, despite having written to the ray payload before terminating traversal.

1 Like

I would like to add a +1 here to see if this behaviour can be changed / fixed (perhaps optionally, through a flag).

What my use case is for, is transparent objects using alpha blending, using path tracing and NRD denoiser. I was told by the NRD author(s) that it simply does not support stochastic transparency, and the workaround was to use rasterization in a separate pass for primary visibility transparencies only. But this is unacceptable, since I need the transparency to work in reflected / indirect rays as well.

So my solution I tried, to avoid having noisy translucencies (that NRD couldn’t denoise properly), was to store the transparency + albedo in a separate channel in the payload, and always call IgnoreHit no matter what the alpha was (other than 0), and store the alpha/base colour separately in the gbuffer, then denoise the opaque stuff, then blend it back in, during compositing pose-denoise, but pre-upscaling. DLSS would then take care of the rest of noise remaining.

I’m using Vulkan RT with HLSL shaders (compiled to SPIR-V through DXC.exe), rather than DX12/DXR, but I expect it has the same issues or bugs in either API.

I discovered that this bug with the AcceptHitAndExit and IgnoreHit calls has to do with these intrinsics being called by a nested function calls inside the anyhit entry point. If I call these intrinsics in the main body, these functions behave as expected.

I will test this myself shortly in my AnyHit and report back, but I trust it’s probably the same for me. However, my original bug wasn’t using any nested function calls, and I’ve been using HLSL → dxc → SPV pipeline for ~2 years. I admit I haven’t tested it recently so it may indeed be fixed (or could have just been a bug in my code, but I don’t think so since it’s dead simple).

Welcome to the NVIDIA developer forums @natemorrical !

Good to hear that you could figure out how to fix your issue!

I found a work around, but this isn’t a solution. There does appear to be a bug in NVIDIA’s drivers here. If it’s part of the spec that I’m allowed to call IgnoreHit / AcceptHit from nested functions, that shouldn’t give me undefined behavior…

Thanks for clarifying this!

I forwarded the information, let’s see if we can get some engineering response.

Good news! This is already being tracked as an internal bug.

As it is with these things I cannot make any statement on whether it can be fixed and how long it might take.

But I will try to keep track of it and update here when it has been resolved.

1 Like

Thanks natemorrical and belakampis1 for reporting the issue!

We believe this is a DXC codegen bug when translating from HLSL to SPIRV and not a bug in NVIDIA’s drivers.

You can find exact details of the issue here:

In order to make this work, you will have to implement ‘unwind/cleanup’ semantics to copy out to payload manually in the shader as I have explained in the DXC bug above

Thanks

1 Like

Thanks, sorry I didn’t notice your reply, a workaround is totally fine. After reading all that, I still don’t really understand what the workaround is, exactly, though.

Can you walk me through it? (even just pseudo code or a starting point for a search)

natemorrical Can you explain what your own workaround was? I need a fix for this so I don’t have to fallback to doing it in inline raytracing mode instead (which is slower for my use case)

Sorry for the slow response.

I also responded briefly over email to @belakampis1, but I figured I’d also give a public response for anyone who might come across this thread.

I wouldn’t be surprised to discover that this is another DXC bug. I’ve had luck in the past in adding a reproducer to Sascha Willem’s Vulkan SDK examples. I think part of the issue is that many folks just don’t understand parts of the ray tracing pipeline like the shader binding table, anyhit, custom intersection shaders, callable shaders, etc. So, an example in Sascha’s SDK can be both educational for the community and can also demonstrate that the equivalent code in GLSL works while HLSL/DXC doesn’t, which can help DXC devs reproduce the issue and put pressure on them to fix it.

My workaround at the moment is rather invasive, but I am currently wrapping all HLSL entry points with my own C style macro, which allows me to inject code at the beginning and end of each entry point body. The bug occurs when intrinsics are called inside of functions, so you need to call the intrinsics only from the main entrypoint body.

So, I currently create two static booleans right above the macro, and write my own namespaced versions of the intrinsics:

static bool _ignoreHit = false;
static bool _acceptHitAndEndSearch = false;
namespace gprt {
  void ignoreHit() {_ignoreHit = true; }
  void acceptHitAndEndSearch() { _acceptHitAndEndSearch = true; }
}

#define GPRT_ANY_HIT_PROGRAM(progName, RecordDecl, PayloadDecl, AttributeDecl)                                         \
  /* fwd decl for the kernel func to call */                                                                           \
  inline void progName(in RAW(TYPE_NAME_EXPAND) RecordDecl, inout RAW(TYPE_NAME_EXPAND) PayloadDecl,                   \
                       in RAW(TYPE_NAME_EXPAND) AttributeDecl);                                                        \
                                                                                                                       \
  [[vk::shader_record_ext]] ConstantBuffer<RAW(TYPE_EXPAND RecordDecl)> CAT(RAW(progName),                             \
                                                                            RAW(TYPE_EXPAND RecordDecl));              \
                                                                                                                       \
  [shader("anyhit")] void __anyhit__##progName(inout RAW(TYPE_NAME_EXPAND) PayloadDecl,                                \
                                               in RAW(TYPE_NAME_EXPAND) AttributeDecl) {                               \
    progName(CAT(RAW(progName), RAW(TYPE_EXPAND RecordDecl)), RAW(NAME_EXPAND PayloadDecl),                            \
             RAW(NAME_EXPAND AttributeDecl));                                                                          \
    if (_ignoreHit)                                                                                                    \
      IgnoreHit();                                                                                                     \
    if (_acceptHitAndEndSearch)                                                                                        \
      AcceptHitAndEndSearch();                                                                                         \
  }                                                                                                                    \
                                                                                                                       \
  /* now the actual device code that the user is writing: */                                                           \
  inline void progName(in RAW(TYPE_NAME_EXPAND) RecordDecl, inout RAW(TYPE_NAME_EXPAND) PayloadDecl,                   \
                       in RAW(TYPE_NAME_EXPAND) AttributeDecl) /* program args and body supplied by user ... */
#endif

I set these values to false when the entry point is called, then inject the user’s actual entrypoint code. If I determine that these virtual intrinsics have been set to true by the end of the entrypoint, then I call the real intrinsics at the end of the function.

Ideally I wouldn’t have to do this, since I wouldn’t be surprised if this completely destroys any ability to profile my kernels with debug information… With all the workarounds I have for HLSL, I’ve been starting to consider migrating to Slang instead…

Thanks so much! I think I get how the workaround works, I’ll try it in my code to see if it fixes my issue (I want to avoid introducing stochastic alpha channel sampling / noise into my final renders, i.e. never stop the hit in the anyhit shader, unless alpha = exactly 1). My only alternative is, I think, inline ray tracing and basically just disable the transparency flag entirely. But I have no idea yet how this would all fit in to Opacity Micro Maps, which I should probably implement at the same time as this workaround (or even beforehand).

I need to support N torch flame billboards at various distances, one in front of the other (ex long corridors / flames + smoke, need to accumulate properly).