Maximum Launch Dimension in OptiX 6

bnbailey · January 10, 2022, 9:43pm

I have been trying for some time to transition my code from OptiX 5 to 6, however I seem to be running into an issue with the maximum launch dimension, which appears different between OptiX 5 and 6.

The documentation for the launch functions (rtContextLaunch[*]D) states “For 3D launches, the product of width and depth must be smaller than 4294967296 (2^32).”. I have found this to be true, except for OptiX 6. It appears that for v6+, the product of launch dimensions cannot exceed 1024^3. This seems to be the case for any dimension launch. When the total launch dimension exceeds 1024^3, I receive the error:

OptiX Error: 'Unknown error (Details: Function “RTresult _rtContextLaunch2D(RTcontext, unsigned int, RTsize, RTsize)” caught exception: Encountered a rtcore error: m_api.launch3D( cmdlist, pipeline, launchBufferVA, scratchBufferVA, raygenSbtRecordVA, exceptionSbtRecordVA, firstMissSbtRecordVA, missSbtRecordSize, missSbtRecordCount, firstInstanceSbtRecordVA, instanceSbtRecordSize, instanceSbtRecordCount, firstCallableSbtRecordVA, callableSbtRecordSize, callableSbtRecordCount, toolsOutputVA, toolsOutputSize, scratchBufferSizeInBytes, width, height, depth ) returned (9): Launch failure)

This can be reproduced using one of the SDK examples, in my case I was using “optixSphere”. I made the following changes:

optixSphere.cpp line 42: change “width” to 2048*1024 and “height” to 1024 (or any width*height>1024^3).
optixSphere.cpp line ~150: decrease buffer size to avoid potential out of memory error RT_CHECK_ERROR( rtBufferSetSize2D( *output_buffer_obj, 1, 1 ) );
pinholeCamera.cu line 60: add ‘return;’ to beginning of launch function to isolate issue to launch configuration.

In v6.0+ (tested several), this will produce the error above at launch. In lesser versions, it will run without error. As a side note, my much more complicated production code exhibits the same behavior. This is problematic for me, since I have fairly large launches. I could tile them to 1024^3, but that will result in a lot of launches (sidebar question: what is the potential performance hit for this?). I’d like to upgrade to OptiX 7 one day, but I don’t have the time for it anytime soon.

OS: tried on Debian 9 and Ubuntu 18.04.6
CUDA: tried CUDA 9.0, 9.2, and 11.2
GPU: tried Titan Xp, V100

Thank you.

dhart · January 10, 2022, 11:53pm

Hi @bnbailey,

Indeed the OptiX 6 maxiumum launch size is exactly 2^30, or 1024^3. The documentation stating it’s 2^32 is slightly out of date and wrong. The new launch size of 2^30 is intentional, not a bug, and it’s unfortunately not going to change. So, we should perhaps discuss strategies for ways to work around this limit, or whether your launch really needs to be that large.

Your sidebar question is relevant here, because once your launch dimensions exceed 2^30, if the threads are doing any work at all, it’s likely that the overhead of the launch itself is very small compared to the runtime of your kernel. This means that you can safely launch multiple times without incurring noticeable overheads. I just timed this hypothesis on my machine – a launch size of 2^30 with a raygen() function that has a return at the top takes approximately 10 milliseconds to complete. The launch function itself will be on the order of a few dozen microseconds I believe, so the overhead of the launch in this case is maybe in the range of 0.1% - 0.2%, and that’s for an empty kernel. If you do some rendering and store a result then your kernel is going to take longer, which will shrink the launch overhead even further.

My first question is, are you writing a separate value to global memory for every separate thread in your launch? Or are you perhaps using atomics or indexing to write to a buffer that is smaller than your launch? Would your setup potentially allow for putting a loop in raygen, for example, to reduce the launch size? Could your launch size be equal to the number of pixels you’re rendering?

Tiling by launches of 2^30 is, as you noted, a potential option. The overhead of multiple launches should be pretty low, though depending on whether you have long-running threads, the overhead certainly could be higher than the ~0.2% I quoted above (because of the long-tail effects of multiple separate tiles).

OptiX 7, BTW, has the same launch size limit, so while we do encourage upgrading when possible, it wouldn’t actually help with this specific issue. One thing using OptiX 7 could potentially help with here is that it’s much easier in OptiX 7 to use multiple CUDA streams, and if you’re tiling your launches, it might be advantageous for you to further eliminate launch overheads by launching multiple kernels at the same time on different streams. Just something to consider for the future when you have more time.

–
David.

Topic		Replies	Views
Launch size for best performances OptiX	11	1105	June 14, 2022
Limit number of launch width OptiX	3	865	October 1, 2021
OptiX launch dimension inquiry OptiX	4	606	December 18, 2023
Launch index x must be bigger than y? OptiX	3	722	June 14, 2022
Optix 6, moved for loop in ray gen to launch index. Getting launch error 9 OptiX	1	635	June 14, 2022
OptiX crashing when launching pipeline with big data OptiX	5	1054	June 14, 2022
Issue with large 3D program launch size OptiX	3	469	June 14, 2022
3D OptixLaunch to accommodate multiple viewpoints OptiX	4	1185	October 12, 2021
3D Launch using Opitx to obtain 3D-complex data OptiX cuda , kernel	4	862	June 15, 2022
Minimum size of element in rtContextLaunch2D OptiX	8	1396	June 14, 2022

Maximum Launch Dimension in OptiX 6

Related topics