Except for internal unit tests, I haven’t found a public example using OPTIX_BUILD_INPUT_TYPE_INSTANCE_POINTERS either.
This API reference explains the alignment requirements for
OptixInstances and the arrays inside the build input:
If OptixBuildInput::type is OPTIX_BUILD_INPUT_TYPE_INSTANCE_POINTERS instances and aabbs should be interpreted as arrays of pointers instead of arrays of structs.
This pointer must be a multiple of OPTIX_INSTANCE_BYTE_ALIGNMENT if OptixBuildInput::type is OPTIX_BUILD_INPUT_TYPE_INSTANCES.
The array elements must be a multiple of OPTIX_INSTANCE_BYTE_ALIGNMENT if OptixBuildInput::type is OPTIX_BUILD_INPUT_TYPE_INSTANCE_POINTERS.
That is pretty clear about the alignment requirements. (With OPTIX_INSTANCE_BYTE_ALIGNMENT == 16ull):
When using an array of OptixInstances, then the device pointer to the array needs to be 16 byte aligned.
Since the OptixInstance struct is padded to an 80 bytes size manually, all OptixInstance elements in that array are 16 byte aligned.
If you’re using an array of pointers to OptixInstances, then each pointer in that array must point to a 16 byte aligned device address because the OptixInstance needs to be 16 byte aligned.
A CUdeviceptr itself is 64 bit and needs to be at 8 byte aligned.
Either alignment of the build input instances or instance pointer arrays shouldn’t be a problem when allocating the memory with cudaAlloc() or cuMemAlloc() which are at least 256 byte aligned.
So in your case you first need to make sure that the individual pointers to the OptixInstances are all aligned to 16 bytes.
Just add an
assert((device_pointer & 15ull) == 0) to all your individual OptixInstance pointers in your build input array.
If that fires inside the debugger, you need to place the OptixInstance field in your own structures at a properly aligned offset and potentially pad your structure’s size.
Since the OptixInstance itself doesn’t have an
__align__(OPTIX_INSTANCE_BYTE_ALIGNMENT)(which I think should have been added inside the OptiX SDK) that might have been placed at a misaligned offset in your structure for the first or later elements.
You can use that
__align__ to let the compiler automatically place that in your own structures, but beware of additional padding inside the struct.
There are many examples inside the OptiX SDK examples which use that for the Shader Binding Table record structures.
My approach for device side structures is to order their fields by CUDA alignment restrictions from big to small and pad them manually to the largest alignment needed in a struct.
The compilers will normally handle the alignment for built-in types, but this also makes sure there is no inadvertent padding added between fields inside the structure to make them as small as possible.