TL;DR If you happen to have a
CUdeviceptr that actually needs 64bits and stick that in the SBT as the base for any of the entries, your
optixLaunch will fail spectacularly (with a
So to start this off…
Let’s note that on the “Device Side” (actual CUDA code) the value of
sizeof(CUdeviceptr) is 4 while alignment is 8, unless I’m using NVRTC wrong somehow (I don’t see an explicit 64bit option).
I added some debug to my program and I saw that my CUdeviceptr inside the SBT on the Host side are using some upper 32bits of the storage.
Here are my CUdeviceptrs for the raygen, miss and hit pointers in the SBT, as you can see the 9th hex digit is 2.
0000000203800000 0000000203A00000 0000000203C00000
And here’s the result of printing the pointer
RayGenData* rtData = (RayGenData*)optixGetSbtDataPointer();
As you can see the 2 in the front got truncated.
I had a look at the implementation of
optixGetSbtDataPointer and it does a unsigned long long to CUdeviceptr cast.
Now I’d guess the examples work because they’re not using OpenGL interop, nor the CUDA Driver API, nor JIT compilation (unlike me where OpenGL is the owner of all buffers and images), and you are unlikely to get a pointer outside the range of 32bits with
cuGraphicsResourceGetMappedPointer does return some pretty high values.
I’ve made a tag with a reproducable example from my engine
as well as a binary build (unfortunately to run it requires your OptiX CUDA headers to be present under “C:/ProgramData/NVIDIA Corporation/OptiX SDK 7.0.0/include”, cause I’m jit-ing the raytrace programs)
This simple workaround fixes everything: