Optix payload value incorrect

michael.blazej · May 29, 2024, 6:45pm

I have been running into this odd situation where a payload value is not coming through properly in the closesthit program. I’m able to print the value and confirm it changes in the raygen program, but when I read it back with optixGetPayload_0(), I just get a value of 1 (the value is a uint and changes often). It doesn’t happen all the time either, it seems to depend on what GLTF I load up. The GLTF’s consist of one or two flat panels made up of two triangles.

Additionally, I’ve passed the same value into an additional payload position and I can successfully read it with optixGetPayload_22().

System Specs:

Ubuntu 22.04.4 (64-bit)
GeForce GTX 1080 Ti
Driver Version: 550.54.15
Cuda 12.4
Optix 7.7.0

dhart · May 30, 2024, 6:17pm

Hi @michael.blazej,

I don’t have any theory as to what might be going wrong. Is it possible to share a complete minimal reproducer?

Are you using the payload semantics API at all?

In theory the payload should have nothing to do with GLTF, but maybe there’s a memory corruption. Have you reviewed your GLTF files and code with an eye toward buffer alignment, i.e., check that the device buffers are correctly aligned? Some GLTF files have buffers that aren’t binary compatible with the GPU, so if you copy them directly without somehow transcoding the data, they won’t work as expected.

–
David.

michael.blazej · May 30, 2024, 7:47pm

I will try to create a minimal reproducer if we get stuck but your mention of “buffer alignment” reminded me of something. I did add an unsigned int entry in the MaterialData struct (from the SDK). I hadn’t counted the number of bytes in that type before but would that mess up the assumed alignment?

For generating my GLTF files, I’m normally exporting them from Blender and using the tiny_gltf library that comes with the SDK to read them in which I haven’t had any issues with before.

I don’t believe I’m using the payload semantics API as that doesn’t sound familiar but I’ll look into it.

Thank you for the help.
-Mike

dhart · May 30, 2024, 8:16pm

Usually when alignment is wrong, you’ll get a crash, but there are some different ways that mistaking it might sneak through and corrupt something. I probably wouldn’t assume there’s an alignment problem with your MaterialData. I only wondered out loud about alignment since it seemed like GTLF was somehow involved.

For GLTF, if you can load the file and see the model in your application, then buffer alignment probably isn’t an issue. Just for context, the issue I’m referring to could happen with Blender and tiny_gltf. It’s possible to have fully correct files according to the GLTF spec that just can’t be copied directly into GPU device buffer memory for use in OptiX. It’s because OptiX is more strict about certain alignments than GLTF is, not because anything is wrong with the file.

So you have a local uint variable you write to in raygen, and then you call optixTrace(), passing this variable as the first payload value, and then in closesthit, the value is different from what was passed? Is that an accurate description of the setup? Is there any other code in between your setting and getting of the payload value, or can you print in raygen immediately before trace and print again as the first thing in closesthit and see the value is different than what was passed? I’m not sure this would be fruitful, but if you felt like diving deep, you could also try taking a look at the SASS of your hit and raygen programs in Nsight Compute and see if you spot the payload register being overwritten somewhere.

You’d probably know it if you were using payload types & semantics. It’s optional and lets you customize the visibility of payload entries between your different optix shader programs. For a description, see the Payload chapter of the OptiX Programming Guide, or take a look at the SDK samples optixPathTracer or optixCompileWithTasks as these both make use of the payload API.

This might all be red herrings, so don’t waste too much time on my stream-of-consciousness ideas. I guess we just need to take a look at the reproducer and figure out what’s happening before I speculate too much.

–
David.

michael.blazej · May 31, 2024, 12:51am

That is an accurate representation of the problem. In trying to generate a minimal reproducer, I did identify what was happening, but I still don’t understand why it happens with only some GLTF inputs.

I did this to myself by trying to take a shortcut. Here is what happened.

When casting rays to sample the scene, I don’t care about misses, so I just left that function empty.

I have a transmitter/receiver setup where they are not always co-located, so I need to check that the receiver is visible from the hit point within the closest hit function, so I thought I’d repurpose the miss function so that if a ray cast from the hit point toward the receiver is a miss, then the receiver is visible from the hit point. That test is satisfied by the miss function setting payload0 to 1 (an arbitrary value I chose), and that value is checked in the closest hit function. I’m not sure why some GLTF inputs cause the miss function to modify the payload prior to entering the closest hit function (the miss test happens well after I grab payload0 and print it), but that is the cause of the changed value.

I’ve since changed the miss function to modify payload 31 with a unique value so it doesn’t step on any pre-existing values. I know there is probably a better way to do this, but it does work for now.

droettger · May 31, 2024, 12:16pm

Your descriptions sounds like you either didn’t properly initialize all your payload registers or something is stomping on them inadvertently.

Is anything reported by the OptiX validation mode when enabling that with a logger callback?
Links to example code here: https://forums.developer.nvidia.com/t/optixhello-embeded-in-new-application-run-in-release-but-not-in-debug-mode/253037/2

Is there any exception reported when enabling all OptiX exceptions?
When you do not have your own OptiX exception program the validation mode would insert one which prints the exceptions.

When you say you only write results inside the miss program, did you initialize all payload values to proper values in case the miss program is not reached?

I’d repurpose the miss function so that if a ray cast from the hit point toward the receiver is a miss, then the receiver is visible from the hit point.

Is that the miss program on the same ray type which reached that closest hit program?
Then you implemented a recursive algorithm and would need to protect against that with some flag which indicates that this is already a recursive optixTrace inside the closesthit program.
That would also require that you set the maximum recursion depth on the OptiX stack size calculation correctly.

In all these cases where you used different payload registers (22 and 31), have you also set the
OptixPipelineCompileOptions numPayload value to the proper count?
https://raytracing-docs.nvidia.com/optix8/api/struct_optix_pipeline_compile_options.html

There would also be newer display driver branches to be tested in case this is some error inside the OptiX/CUDA compilation. There are R555 display drivers released for your board.

Just to exclude issues with the glTF asset itself or the rather limited glTF loader routines inside the OptiX SDK examples, in case you based your code on those, there is also this GLTF_renderer in my OptiX Advanced Examples which handles more glTF features, in case you need a second opinion on the glTF file contents.

michael.blazej · June 6, 2024, 4:23am

Using the miss function for the same ray type as the closesthit function was not the right thing to do. No matter what I did, I still stomped on the pre-existing values.

I ran with validation mode, and there was no additional reporting, nor when enabling all OptiX exceptions. The only proof that I am stomping over values is when I use printf statements where it makes sense.

I haven’t tested the GLTF renderer you proposed, but through other tests, I’m confident the GLTF is formatted correctly. I do like the idea of using that to visualize the scene, though. Thank you for the reference.

I’m guessing that the most resilient approach would be to create a new miss program for a different ray type than what I’m using for the closesthit function with a single payload value for confirming a miss/hit. Would this be an addition to the SBT?

droettger · June 6, 2024, 6:58am

I’m guessing that the most resilient approach would be to create a new miss program for a different ray type than what I’m using for the closesthit function with a single payload value for confirming a miss/hit. Would this be an addition to the SBT?

Yes, the number of ray types is indexed with the optixTrace argument SBTstride for the hit records and the explicit missSBTIndex for the miss program (this is getting important below).
https://raytracing-docs.nvidia.com/optix8/guide/index.html#shader_binding_table#sbt-trace-offset
See the SBT indexing formula here:
https://raytracing-docs.nvidia.com/optix8/guide/index.html#shader_binding_table#accelstruct-sbt

Mind that this SBTstride value is only four bits wide, so there is a maximum of 16 ray types, which is usually not a problem because it’s easily possible to let the same ray type handle different things by indicating the current use case with some payload flag. That’s effectively what you tried and I would still expect that to work.

Usually (see below!) if you previously only used one ray type and now use two, your optixTrace SBTstride value changes from 1 to 2 and you would need to provide twice as many miss and hit records inside your SBT. See the missRecordCount and hitgroupRecordCount arguments here:
https://raytracing-docs.nvidia.com/optix8/guide/index.html#shader_binding_table#layout

The additional hit records are not required when your SBT contains no anyhit programs!

I’d repurpose the miss function so that if a ray cast from the hit point toward the receiver is a miss, then the receiver is visible from the hit point.

Note that the fastest visibility ray implementation, which only needs an additional miss program inside the SBT when there are no anyhit programs on the visibillity ray), is described here:
https://forums.developer.nvidia.com/t/anyhit-program-as-shadow-ray-with-optix-7-2/181312/2

I’m using that inside my examples.
Read this comment describing the visibilty/shadowray implementation and why this does not need additional hit records inside the SBT!
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo12/src/Device.cpp#L692
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo12/shaders/miss.cu#L42
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo12/shaders/brdf_diffuse.cu#L186

As opposite example, that slower but general purpose shadow ray with anyhit programs is implemented here:
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo9/src/Device.cpp#L683
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo9/shaders/anyhit.cu#L88
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/rtigo9/shaders/closesthit.cu#L216

(Looks like something isn’t working right with these links on github right now.
They sometimes jump to the wrong line. Scroll down to the highlighted line number then.)

Note the different ray flags and SBT indices and strides inside the shadow ray optixTrace call of the two examples’ closest hit program!

Another fast method is to use the Shader Execution Reordering optixTraverse call and check that for hit or miss. That is shown inside the OptiX SDK optixPathTracer example.

Topic		Replies	Views
Why payload not update? OptiX cuda , optix	2	156	October 3, 2024
Two questions: 1. payloadtype semantics 2. ray-triangle intersection OptiX	11	845	September 13, 2022
Payload exception - Illegal address OptiX	4	956	June 14, 2022
Optix 7.0: Payload data set using optixSetPayload_x() lost if anyhit program calls optixIgnoreIntersection() OptiX	6	1270	June 14, 2022
Optic 7 Passing multiple Ray data to __closesthit__ program OptiX	12	2073	October 12, 2021
Data of individual ray gets modified unintentionally OptiX cuda , optix	3	655	December 11, 2023
Strange behavior: printf in miss-program executed, but payload not set OptiX	4	737	June 14, 2022
[OptiX 7.5] Payload type mismatch errors when using OptiX-IR OptiX	15	2191	December 2, 2023
Anyhit.cu in OptiX_Apps OptiX	3	778	October 12, 2021
PerRayData in local struct more performant than in OptiX payload OptiX	3	157	November 6, 2024

Optix payload value incorrect

Related topics