After calling optixTraverse, the CUDA core enters an idle state while the RT core is traversing. Can we utilize this idle period to perform some independent tasks that have no dependencies? Would this be beneficial for improving performance?
for instance
// in raygen
for(int i = 0; i < n; i++){
// do something
optixTraverse()
// Carry out other calculations without reliance
cal
// Carry out other calculations without reliance
Handle based on the traversal result
}
Does Optix handle these scheduling? Will it suspend the thread after calling optixTraverse and switch it to run in another thread? Wait for optixTraverse to complete before resuming the thread. If so, doing some tasks without dependencies seems meaningless.
After calling optixTraverse, the CUDA core enters an idle state while the RT core is traversing.
No, do not assume that. CUDA cores will automatically switch to doing any other useful work they can. Generally speaking, warps are usually juggling at least 3 or 4 different threads at any given time. The hardware does this scheduling on a per-instruction basis.
You can use Nsight Compute to profile and analyze stalls in your kernel code and see where any unwanted idle time might exist. You can also use Nsight Systems to profile and analyze stalls in between CUDA kernels or between CUDA and host code, and make sure the GPU is saturated and operating at capacity. It’s a good idea to start with Nsight Systems first, since system and kernel level idle times are typically much bigger and easier to solve than instruction level idle times.