Issues running OptiX concurrently with a CUDA kernel that uses shared memory

I would emphasize again that (even when future drivers will support __threadfence in OptiX device code, which they will, I just don’t know the driver version, yet) this communication idea between an OptiX kernel and native CUDA kernel will never work as efficiently as the previously recommended implementation of a wavefront renderer where OptiX does the ray-primitive intersection and the native CUDA kernels do the ray generation and shading with all native CUDA features you want. Please give this a try.