there is callable program in Optix API but no in glsl,cuda,hlsl .
I know only the optix is for raytracing so callable program is better for optix.
In gtx pipeline(not rtx) , there is only rasterization & gpgpu pipline …
and we can’t use callable program(function pointer) when we use glsl,cuda,hlsl.
I think It is good API design because of the design of gpu (group of SM(streaming multiprocessors) that process same instruction).
but It is possible if we use optix api on gtx graphic card.
How callable program works?
I think gpu branches SM(=run another SM) in hardware.
Is it correct to say?
in OptiX, a callable program is an index rather than function pointer. A switch…case is applied to find your real function with the given index. This is more like a virtualization from API instead of a new GPU feature.
You are right, intra-warp divergence is potentially problematic. AFAIK, it is more a programmer’s duty to reduce the divergence rather than the GPU itself. In raytracing, we can sort rays to increase ray coherence, or more commonly, rendering to a 2D image with 2D launch index is usually better than 1D launch index, since the material and geometry distribution are usually more coherent at 2D.
GPU itself may also help a little bit. If you wrote code in CUDA C directly (as OptiX limits the language for its kernels), you may guide the GPU to help more. For more information, feel free to read from Page 55: https://docs.nvidia.com/pdf/CUDA_C_Best_Practices_Guide.pdf
Also, keep in mind that one of the primary purposes of the optix engine (and its underlying scheduling and JIT compilation) is to allow for the mitigation of such divergence. Optix is free to schedule around callable programs, both to decrease divergence at such call sites, but also to reconverge after the callable program returns.
That said, for many ray-tracing use cases (eg, path-tracing) divergence can still be a major perf limiter.