Is there an efficient way to remove a ray from a buffer of rays?
O(1) algorithm would be to copy the last one to the one to be replaced and shrink the effective buffer count by one.
What you actually want to do for efficient ray tracing with OptiX Prime is to stage multiple queries of smaller size (but something bigger than 65536, e.g. like 200,000 rays) and run queries asynchronously, while building the next ray buffer from the remaining rays and filling up terminated ones from the overall ray pool if any are left to avoid draining the pipeline.
An example how to use multiple asynchronous queries can be found in the OptiX SDK example primeMultiBuffering.
I think there is no example how to pool rays and generate queries this way though. But as long as you’re having unhandled rays left in a bigger pool you can just fill up terminated ones in the query with them.
Only when you’re running out of work you could shrink the query. That’s a compaction problem and you should be able to find efficient solutions for that on the CUDA forum. For example, thrust::copy_if() implements stream compaction. https://thrust.github.io/doc/group__stream__compaction.html