Is there any room for optimization in the idle time after calling optixTraverse?

  1. After calling optixTraverse, the CUDA core enters an idle state while the RT core is traversing. Can we utilize this idle period to perform some independent tasks that have no dependencies? Would this be beneficial for improving performance?

for instance

// in raygen

for(int i = 0; i < n; i++){
	// do something
    optixTraverse()
    // Carry out other calculations without reliance
    cal
    // Carry out other calculations without reliance
    Handle based on the traversal result
}

  1. Does Optix handle these scheduling? Will it suspend the thread after calling optixTraverse and switch it to run in another thread? Wait for optixTraverse to complete before resuming the thread. If so, doing some tasks without dependencies seems meaningless.

After calling optixTraverse, the CUDA core enters an idle state while the RT core is traversing.

No, do not assume that. CUDA cores will automatically switch to doing any other useful work they can. Generally speaking, warps are usually juggling at least 3 or 4 different threads at any given time. The hardware does this scheduling on a per-instruction basis.

You can use Nsight Compute to profile and analyze stalls in your kernel code and see where any unwanted idle time might exist. You can also use Nsight Systems to profile and analyze stalls in between CUDA kernels or between CUDA and host code, and make sure the GPU is saturated and operating at capacity. It’s a good idea to start with Nsight Systems first, since system and kernel level idle times are typically much bigger and easier to solve than instruction level idle times.


David.

Thank you. This explains my strange performance.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.