Payload Usage Annotation

shocker.0x15 · November 27, 2021, 7:38am

Hi, I’m adding payload usage annotation feature to my OptiX wrapper library.

Release notes say “Refer to the programming guide for additional information” but I could not find any information, and there seems no related sample code in the SDK.

So I try to implement it with clues from comments in the OptiX header files and error messages in runtime.

My test code seems almost complete since now OptiX runtime no longer outputs error message (Initially it output several errors like inconsistency between OptixPayloadTypes set in the host side and optixSetPayloadTypes in kernels) and the program finishes to run without crash.
However, resulting image is a bit corrupted compared to the one from a run without the annotation.

What I did for the annotation so far are:

set an array of OptixPayloadType to OptixModuleCompileOptions for optixModuleCreateFromPTX.
set an OptixPayloadType to OptixProgramGroupOptions for each of hit program groups (CH, AH, IS) and miss programs.
use optixTrace variants with OptixPayloadTypeID parameter.
call optixSetPayloadTypes in CH, AH, MS (and IS, I have no IS for now).

My usage for this feature in a simple path tracing program looks like the following:
e.g.) A RG program calls optixTrace with payloads of

rng: random number generator type (8 bytes)
The RG reads from a buffer and passes to the trace call, CH consumes and write back to payload, then RG writes to the buffer.
trace_caller_read_write | ch_read_write
alpha: float3 type (12 bytes)
The CH returns a coefficient to payload, then the RG reads this to update path throughput. The total path throughput is used only by the RG and CH/MS need not to know it.
trace_caller_read | ch_write
contribution: float3 type (12 bytes)
The CH/MS returns a contribution from a shading point to payload, then RG reads this to accumulate to the total path contribution with multiplying the current path throughput.
trace_caller_read | ch_write | ms_write
origin: float3 type (12 bytes)
direction: float3 type (12 bytes)
The CH returns a new ray origin/direction to payload, then RG reads this to update the ray origin/direction used for the next optixTrace.
trace_caller_read | ch_write
flags: uint32_t type (4 bytes)
Several information like flags are packed to this 4 bytes to communicate between RG and CH/MS.
trace_caller_read_write | ch_read_write | ms_read_write

The number of payload registers should be 2 + 3 + 3 + 3 + 3 + 1 = 15. I think I set proper access flags for the above payload variables. (e.g. trace_caller_read | ch_write | ms_write for contribution is set to payload registers 5, 6 and 7).

I defined this payload usage as OPTIX_PAYLOAD_TYPE_ID_0. The RG calls optixTrace with OPTIX_PAYLOAD_TYPE_ID_0 and the CH and MS call optixSetPayloadTypes(OPTIX_PAYLOAD_TYPE_ID_0).
I also defined one more payload type OPTIX_PAYLOAD_TYPE_ID_1 used in the CH for shadow rays (An AH updating a visibility value calls optixSetPayloadTypes(OPTIX_PAYLOAD_TYPE_ID_1)).

Are there anything I forget to do?
Are there pit falls in using this feature?

Thanks.

dhart · November 29, 2021, 5:42pm

Hi @shocker.0x15,

Sorry to hear you’re hitting snags with the new annotation API. You’re right, there is no SDK sample code for this yet, we are still working on that part and hoping to release it ASAP.

My very first question is have you tried turning on debug exceptions? There is a new exception code OPTIX_EXCEPTION_CODE_PAYLOAD_TYPE_MISMATCH that may help detect if there are conflicting or overlapping semantics.

Let me know if you tried that, or if it doesn’t give you some indication of what the issue is. Corruption in the image does sound like it would be caused by overloaded register usage. Does the type of corruption you see give you any hints about which payload values are colliding?

–
David.

shocker.0x15 · November 29, 2021, 7:31pm

Hello, thanks for the reply.

I actually tried to use OPTIX_EXCEPTION_FLAG_DEBUG for the pipeline option. However unfortunately with this flag turning on, the issue completely disappears and the program produces the correct image.

Type of image corruption has probabilistic behavior but basically produces three types of image.
I uploaded the corrupted images and the correct image.
https://drive.google.com/drive/folders/1LsTqQpNATh17E9fooqeDVdB6JgjVLcpS?usp=sharing

For example, comparing corruption_a.png with the expected_image.png, there are ghost bunnies. So this may indicate that some thread (or how to say, something associated with a ray) incorrectly brings back values of path throughput alpha and/or contribution from another thread to the RG.

My sample program for testing this feature is here in case you want to see what I do (Note that I wrote the code on the top of my OptiX wrapper, so the code is not minimized):

Best,

dhart · November 29, 2021, 8:16pm

Oh no! Bugs that go away when you turn on debugging features are the worst.

To me it looks like all three corrupted images might have incorrect ray direction values for the secondary reflection rays. That could easily cause all the artifacts I noticed, including the ghost bunnies. Maybe it’s even possible that the three types of corruption correspond to the three axes of the direction vector? (x,y,z)

Thank you for the images and sample code, I will study this today and see if I can spot any problems. Certainly if there is a payload mismatch that we are not catching, there could be an OptiX bug in here.

–
David.

shocker.0x15 · November 29, 2021, 8:21pm

Ah makes sense, you’re right.
Corruptions are explainable with secondary ray directions!
Thanks for checking.

dhart · November 29, 2021, 11:32pm

An update here after examining your code-

As far as I can tell there seems to be a compiler bug on our end. I have filed it and expect someone on the team will fix it in the next couple of days. If that goes well, it should be released in a driver update soon. In either case, I’ll post again with the solution when this is resolved.

The problem here happens with output-only payload values, so I can suggest as a temporary workaround marking your ray-direction payload values as _TRACE_CALLER_READ_WRITE. This seems to make the corruption go away for me for now (but I’m not sure if the corruption just moved to a different payload value; several others are output-only as well.)

Thanks for the reproducer code!! This makes it so much easier to track down issues like this. We liked seeing your payload management code and how it automatically handles groups of payload values like vector float3 objects, that is a nice touch.

–
David.

shocker.0x15 · November 30, 2021, 1:23pm

Thanks for investigating!

As you can see the project name, my sample program is for demonstrating payload annotation with my wrapper so I can’t use the workaround :D
However, for another project where I use my wrapper, I’ll use the workaround.

And thanks for seeing my wrapper implementation. I’m proud of hearing such a message from the insider!

I’ll waiting the update from you, then I’ll mark this issue resolved.

Best

shocker.0x15 · March 9, 2022, 5:44pm

Was this issue resolved in the latest driver?
It seems the program generates the expected image with the driver 511.79.

Best,

droettger · March 10, 2022, 6:58am

Yes, 511.79 drivers have been built after the change fixing the issue.

shocker.0x15 · March 10, 2022, 2:36pm

Thanks, I’ll mark this as resolved.

system · March 24, 2022, 2:36pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.