Porting APP from Optix 3.8 (32 bit) to Optix 6.5 (64 bit) : Need some help, please

I use Optix in a Delphi application (not C++).
I have no possibility to add a Consolle to my APP.
Is there a way to get back the strings without the consolle? (I have read the documentation but no trace to do that)

Is there a way to get back the strings without the console?

Not that I know of. The rtPrintf goes to the standard out stream.
You can also use CUDA’s native printf function instead which goes to some host-stream, but I don’t know how to set that stream either.

The pragmatic approach then would be to implement a switch-case for all RTexception enum values and either write a different color or use an additional integer output buffer and directly write the exception code per pixel.

I’ll do more tests

…and the winner is…



I was able to pinpoint the instruction that generates the exceptions…
If I comment “rtTrace” no exceptions, If I return false (not shadowed) or true (shadowed) the program works anyway (non red dots)

static __device__ __inline__ bool TraceShadowRay(
        optix::float3 Position,      // Ray starting point
        optix::float3 Direction,     // Ray shooting direction
               float  MaxDistance,   // Ray max allowed distance
        optix::float3 &ShadowColor)  // Output shadow color (1.0,1.0,1.0) => No shadow, (0.0,0.0,0.0) => Complete shadow            
    float  SceneEpsilon = GlobalSettings[0].SceneEpsilon;
    TShadowRayData ShadowRayData;
    // Initialize values
    ShadowRayData.Shadowed = false;
    ShadowRayData.ShadowColor = make_float3(1.0f);
    // Create Optix Ray
    optix::Ray ShadowRay = optix::make_Ray( Position, Direction, SHADOW_RAY_TYPE, SceneEpsilon, MaxDistance + SceneEpsilon );
    // Lunch Ray 
    //rtTrace(top_object, ShadowRay, ShadowRayData);
    // Get results
    ShadowColor = ShadowRayData.ShadowColor;
    // done	

That’s most likely a known issue with bool types in payload structures.
We’ve seen this before:

Please try changing your TShadowRayData Shadowed member from bool to int or unsigned int and use 0 and 1 to set it instead.

No change… (red dots again)

Nothing I can do about that with the given information.

I neither know your system configuration, nor do I have any means of reproducing or experimenting with this.
Again, please always provide the following system configuration information when asking about OptiX issues:
OS version, installed GPU(s), VRAM amount, display driver version, OptiX (major.minor.micro) version, CUDA toolkit version (major.minor) used to generate the input PTX, host compiler version.

The PTX code generation depends on the CUDA toolkit. I would recommend CUDA 10.1 or 10.2 for OptiX 6.5.0.
The microcode generation is driver dependent.
Means all these above system configuration information are required to reduce the turnaround time on OptiX questions.

Did you set the recursion depth to let OptiX calculate the correct stack size?
Do you have any other payload structures with bool types in them? Replace them as well.
As said in the second link above, I would also recommend analyzing all your structures to have members placed at their native CUDA alignment restriction offsets as well to avoid unnecessary padding or bugs.

Cuda version: 11.4
Optix library version: 6.8.1
Device name: NVIDIA GeForce RTX 3060 Laptop GPU
Windows 11 Pro (21H2)
NVIDIA System Information.txt (3.8 KB)


Your OptiX version is from the driver query, I assume the CUDA version as well?

I meant the OptiX SDK version and that is 6.5.0.
If you’re actually using the CUDA 11.4 version to translate your *.cu files to *.ptx files, please try using the CUDA Toolkit 10.1 or 10.2 for OptiX 6.5.0 instead.
Always read the OptiX Release Notes before setting up a development system.


As explained before with links to the respective programming guide chapter, that function does nothing in the OptiX 6.5.0 RTX execution strategy.
You must use rtContextSetMaxTraceDepth instead (and rtContextSetMaxCallableProgramDepth when using callable programs).

There are also 11 newer display driver versions available for your system if you’re not getting this to work.


As explained before with links to the respective programing guide chapter, that function does nothing in the OptiX 6.5.0 RTX execution strategy.You must use rtContextSetMaxTraceDepth instead (and rtContextSetMaxCallableProgramDepth when using callable programs).

I’ll try…

P.S. The “transparency rays” works well, only shadows rays fail…

Here an example (shadow off)

I finally solved ( sort of… )

The problem is the call “rtContextSetEntryPointCount(Self.FContextHandle,CAMERA_COUNT)”

In my program I use two cameras (“pinhole_camera.cu” and “radiosity_camera.cu”). The first is the standard Optix implementation the second is a “buffer based” implemetation that I use to calculate “per vertex” radiosity

If I call rtContextSetEntryPointCount(ctx,2) I get the error (red points)
If I call rtContextSetEntryPointCount(ctx,1) I get no error (no red points)



Since I get RT_EXCEPTION_PAYLOAD_ACCESS_OUT_OF_BOUNDS (added in Optix 6) in the shadow rays and I only get this exception if I enable two entry points in my program, the second entry point has different “call backs” (no shadows for example). How can I manage to define two entry points (two camera types) without catching the new exception?
Do I have to define the program “any_hadow_hit” also on the second entry point even if I don’t use the shadow calculation?
Of course I call only the first program in the example above (pin hole camera)

I have never experienced that error. I cannot say what’s going wrong in your case without having a minimal and complete reproducer project containing all required code.

Using multiple entry points should work. There is even an example inside the chapter 3.1.1 of the OptiX 6.5.0 programming guide showing how to set up two ray generation and two exception programs, one per entry point.

If you switch between them you’d need to make sure the recursion depth is set for the higher value of the two ray generation programs.

Looking over your structures in your provided header files, there is another bool inside the TRadianceRayData structure.
Did you replace that as well?

I would revisit the layout of these structures:
TRadianceRayData contains a bool sub_sample.
TShadowRayData contains a bool Shadowed.
TOptixEA3DTextureData contains an unsigned char MappingMode.
TOptixEA3DGlobalSettings is ending with 19 uchars.
struct_TOptixEA3DVertexData contains a float2
If that structure is shared between OpenGL and OptiX/CUDA vertex attributes, that happens to work because the float2 lies on an 8-byte offset.
Note that OpenGL has not the same alignment restrictions as CUDA. For example, a tightly packed interleaved array with { float3 position, float2 texcoord; } would crash with misaligned access errors on the float2 in CUDA because that is not 8-byte aligned.
For identical structures using CUDA vector types on the host and the device, the host compiler and the CUDA compiler will both pad the structure members according the CUDA alignment restrictions.
But to avoid any inadvertent padding and potential misaligned access violations, I usually place structure members according to their CUDA alignment restrictions from big to small, means float4 (16-byte aligned), then float2 (8-byte aligned) then float3 and float (both 4-byte aligned) then shorts (2-byte aligned) and chars (1-byte “aligned”).
When using these in arrays, I pad the structure size to the element with the maximum alignment restriction. The host and CUDA compiler should do that automatically though.

1 Like

Some progress…
Thanks again

More progress, ported ambient occlusion, anti-aliasing, better refractions, gamma correction, etc…
…Optix Rocks !!!


And if you’re still using custom triangle primitives with your own intersection routine, you haven’t tapped into the hardware ray-triangle intersection performance, yet. That requires the built-in triangle primitives and attribute programs to calculate the final vertex attributes deferred.

Also your scene hierarchy is deeper than two acceleration structures. For maximum BVH traversal performance, the recommendation would be to flatten the scene to an OptiX render graph representation with only two acceleration structures on the path from root to the geometry, means a single Group node at the top-level and the rest all in GeometryGroups (what OptiX 7 describes as IAS → GAS structure). That one instance transform is fully hardware accelerated by the BVH traversal on RTX boards.

I wanted to share with you my progress of integrating Optix 6.5 with our interior design program.

What’s going on with the inconsistent shadows of the couches and tables?