why callable program only run in optix api?

syc9624 · January 3, 2019, 7:49am

there is callable program in Optix API but no in glsl,cuda,hlsl .
I know only the optix is for raytracing so callable program is better for optix.

In gtx pipeline(not rtx) , there is only rasterization & gpgpu pipline …
and we can’t use callable program(function pointer) when we use glsl,cuda,hlsl.

I think It is good API design because of the design of gpu (group of SM(streaming multiprocessors) that process same instruction).

but It is possible if we use optix api on gtx graphic card.

How callable program works?
I think gpu branches SM(=run another SM) in hardware.
Is it correct to say?

yashiz · January 3, 2019, 9:19am

in OptiX, a callable program is an index rather than function pointer. A switch…case is applied to find your real function with the given index. This is more like a virtualization from API instead of a new GPU feature.

syc9624 · January 4, 2019, 1:05am

Thank you for your reply.

but If each of threads takes different path , complexity will O(n)

ex) N : (Number Of ShadingType)

if shadingType1
shading1(…)
else if shadingType2
shading2(…)
…
…
else if shadingType3
shading3(…)

if every thread(in single SM) takes same path , complexity will O(1)
but if each of threads(in single SM) takes different path, complexity will O(n)

for avoiding O(n), does gpu works automatically something? ( sorting threads by same path → run threads same path per SM ?)

yashiz · January 4, 2019, 9:56am

You are right, intra-warp divergence is potentially problematic. AFAIK, it is more a programmer’s duty to reduce the divergence rather than the GPU itself. In raytracing, we can sort rays to increase ray coherence, or more commonly, rendering to a 2D image with 2D launch index is usually better than 1D launch index, since the material and geometry distribution are usually more coherent at 2D.

GPU itself may also help a little bit. If you wrote code in CUDA C directly (as OptiX limits the language for its kernels), you may guide the GPU to help more. For more information, feel free to read from Page 55:
https://docs.nvidia.com/pdf/CUDA_C_Best_Practices_Guide.pdf

syc9624 · February 11, 2019, 4:35am

I am sorry for the late reply.

it was a great help for me!! thanks a lot

Keith_Morley · February 11, 2019, 3:50pm

Also, keep in mind that one of the primary purposes of the optix engine (and its underlying scheduling and JIT compilation) is to allow for the mitigation of such divergence. Optix is free to schedule around callable programs, both to decrease divergence at such call sites, but also to reconverge after the callable program returns.

That said, for many ray-tracing use cases (eg, path-tracing) divergence can still be a major perf limiter.