Any-hit shaders are not guaranteed to be called in depth order, and in practice you will see them called out of order. To guarantee you have a hit program called in depth order, you would need to use closest-hit, and potentially re-launch your rays.
I think your any-hit plan sounds reasonable though, with slight modifications, so worth trying it out. My gut reaction is that you might see some contention using
atomicAdd() that could slow things down, so it could be worth thinking about alternative approaches, depending on how efficient this part of your pipeline needs to be. With the any-hit approach, you would probably need to keep track of and sort the hits you want to remember via your payload, and wait until the ray traversal is complete before writing into your
HitBuffer structure in raygen.
Another way to structure the problem might be to cast a kernel of primary closest-hit rays, with any-hit shaders disabled for extra performance. And then collect a buffer of hits to relaunch in a 2nd kernel, and this time use an any-hit program that ignores intersections that match the SBT index of the first hit. Your closest-hit program would finish the job by recording the SBT & GAS index of the 2nd hit, which is guaranteed to be different from the first hit. You could do this without atomics, and after that, perform a reduction on the two buffers. This may or may not be faster than using atomics, it would need to be tested.
Or you could mix the ideas and use a single kernel with any-hit, store the first and second hits into a buffer that matches your image pixel dimensions, and reduce on that single buffer after tracing is complete. You would need to perform your own depth sort on the two hits in your any-hit program as you go by comparing each incoming hit to the two stored hits and shuffling the stored hits or discarding the incoming hit as necessary - takes a little work but I would guess is doable.