Parallel compilation with OptiX 7.4


I’ve been trying out the new parallel compilation features of OptiX 7.4.

Currently, in the experiments I have done, I observed the following:
The optixTaskExecute function returns more than one additional task for relatively large PTX files. For smaller PTX files, the compilation is completely serial. At best, for smaller files, a single additional task is returned by optixTaskExecute. Note that these are not a single function, even the smaller files are composed of about 50 Direct Callables.

In our use case, we are still interested in speeding up the compilation of smaller PTX files. Are there any specific factors in the PTX code that would generate more concurrent tasks with optixTaskExecute?



1 Like

Hi @njroussel, welcome!

This is a great question. The granularity of tasks is currently driven by heuristics that we’ve tuned to prevent diminishing returns, based on example code from our customers. There is a point at which the granularity will cost more time in overhead than you can save with parallel compiles, and we’d prefer to make it hard to accidentally slow things down. How long are the small file compiles currently taking, and what kind of timing are you hoping for?

There isn’t currently any way to control the granularity of small PTX files, but we do have options we could discuss. One would be whether such heuristics could/should be exposed, however this could limit our future options. Another would be perhaps we can do some tuning based on your code, if you can share a reproducer with us and elaborate on your goals. If you prefer not to share any code privately, a third option would be to gather some statistics on your code & compile timings and we could discuss those so we can see & estimate the benefits to you of increasing the compile granularity. If you’re able to share publicly on the forum, that will help others reading this thread, but if you prefer or require privacy for sharing stats or code, feel free to get in touch with optix-help.


It may be possible to speed up OptiX compilation of many small pieces of code by using a process for each hardware thread on your CPU. I use this in some CUDA code using NVRTC so it may work for OptiX as well. I get linear compilation speed based on the the number of hardware threads. This works around limitations in the NVRTC compiler which is not multithreaded and can’t be called from multiple threads in a single process. For IPC I used pipes in Windows. It would be great if NVIDIA could allow their compiler to support multiple threads in a single process though.

1 Like

Agreed, it would be awesome to run NVRTC in parallel., consider this a +1