Hi,
I had a couple of questions regarding interleaving Optix and CUDA kernels.
I am computing my image in a few steps (in a loop, the image is computed within a single frame), by repeatedly calling two distinct Optix kernels (ray gen programs):
while(some condition)
{
if (initial kernel)
{
context->launch(ENTRY_POINT_INITIAL, some dimensionality); // only run once per frame
}
else
{
context->launch(ENTRY_POINT_SUBSEQUENT, some reduced dimensionality); // run every subsequent iteration until image is computed
}
}
Every subsequent Optix kernel is launched with reduced dimensionality, so I assumed that must mean that more and more GPU resources become available in-between subsequent optix kernel launches. Which is why I decided to try and interleave some post-processing routine that can be easily applied to partial images. The code above is then slightly modified, such that the if-statement is followed by an invocation of a CUDA kernel:
if-statement
testCUDAKernel(...)
According to Nsight, the CUDA kernel is indeed interleaved with the optix kernels, but only between “ENTRY_POINT_SUBSEQUENT” kernels (see the image below).
The OptiX kernels here are represented by “cuEventSynchronize” (as far as I understand). What I don’t understand is, for some reason “testCUDAKernel” is always serialized between “ENTRY_POINT_INITIAL” and “ENTRY_POINT_SUBSEQUENT” (here “cuMemcpy_2D_v2” event separates initial and subsequent runs), regardless of dimensionality of Optix kernel and CUDA kernel.
-
Is there some kind of implicit synchronization taking place, when swapping between ray generation programs? Can it be avoided?
-
What is this cuMemcpy2D_v2 event reported by Nsight?
-
Initially I also tried running “testCUDAKernel” in a separate stream, but running it in a default stream similarly interleaves the kernels. Are OptiX 6.5 kernels run in separate streams?
I am using OptiX 6.5, GeForce Quadro P4000, 442.74 driver.