Hi,
The doc says for optixUndefinedValue()
:
In some use cases, register consumption can be reduced by writing optixUndefinedValue() to payload values that are no longer used.
I’m bit curious if writing optixUndefinedValue()
to a payload value before optixTrace()
becomes some help for the compiler to generate better code.
e.g.
float3 color; // undefined value here, will be written during the optixTrace() (closest-hit or any-hit and so on)
optixTrace(..., (uint32_t&)color.x, (uint32_t&)color.y, (uint32_t&)color.z)
I mean writing optixUndefinedValue()
(with some casting) to color
.
In addition to this,
I’d like to know the case the doc originally intended to say.
Could you provide some brief example?
Thanks,
The optixUndefinedValue() returning an unsigned int is intended for device programs writing to the payload registers with optixSetPayload calls.
When there are some programs which do not actually write all payload registers you provided inside the optixTrace() call, you can set the unused ones with optixUndefinedValue() and the compiler will know that this specific device program is actually not using these registers to return any data.
That’s potentially useful for algorithms which have different (amounts of) payload results and you encode what was returned in which payload register.
It might also let the following code itself use the claimed payload register to optimize the remaining function code. That should be the more useful usage. In that case, call optixSetPayload_<number>(optixUndefinedValue())
as early as possible in the called program to free that otherwise unused register for better code generation.
It’s not required if you always write all payload registers provided in optixTrace().
Your case doesn’t make sense because it’s not about the payload register references you provide to the optixTrace() call.
I never needed that function because I either always wrote all payload registers, or because there are only two constant payload registers holding a split 64-bit pointer to some bigger payload struct (means optixSetPayload isn’t used at all).
Thanks for the quick reply.
It seems I misunderstood for what the function exists for.
As you say, it is likely that I either always write all payload registers, or just using two registers holding a pointer to some struct.
However I’m confused that the latter is the case you told, that is:
It might also let the following code itself use the claimed payload register to optimize the remaining function code…
Let’s say there is some optixTrace() call with three payload registers R0, R1, R2.
Then if there is a closest hit function doing this at the end to return three values:
optixSetPayload_0(valueA); // Uses R0 as out
optixSetPayload_1(valueB); // Uses R1 as out
optixSetPayload_2(valueC); // Uses R2 as out
Everything as usual this far.
Now assume there is another closest hit program which could be reached by the same optixTrace() call using three payload registers but only needs to return two values (e.g. like if R0 is some switch evaluated be the caller later to find out which payload registers R1, R2 contain valid data for different events):
optixSetPayload_0(valueC); // Uses R0 as out
optixSetPayload_1(valueD); // Uses R1 as out
Then the register R2 could be used for other standard calculations but normally isn’t because it’s a reserved payload register. Then you could write
optixSetPayload_2(optixUndefinedValue(); // Flags R2 as unused.
at the beginning of the second closest hit program and OptiX could determine that and translate the code for that second CH program in a way using the register R2 for other purposes locally.
The fewer memory accesses, the better the performance.
Or just ignore the whole thing and write algorithms which aren’t doing any of that strange stuff. ;-)
Thanks for the detailed example.
Why don’t we need the function for the case where we use two registers holding a pointer?
Why don’t we need the function for the case where we use two registers holding a pointer?
In my examples because the payload register values remain unchanged around any optixTrace() call.
They are only written once inside the ray generation program holding the local payload structure.
Used with that assumption here
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/nvlink_shared/shaders/closesthit.cu#L276
Well, if you do not have any recursive optixTrace() calls like that shadow ray, then you could try to set it to undefined in the other program domains after the payload pointer has been merged.
(Not inside the anyhit program though, because then you broke it for the closesthit and miss programs.)
That would require to split the payload pointer for each iteration in the ray generation program though.
That said, I need to move that out of my main loop. :-)
No idea if that has any benefit. As I said, I never used it.
Thanks for the explanation! It makes my understanding clearer.