In Optix, How much memory can a single optixLaunch allocate?

I define multiple fixed-length arrays in a.cu file, a single optixLaunch assigned to a struct variable.Specific details are as follows:



When I only increased the length of these arrays, there was a run error "an illegal memory access was encountered " in .cu files, but the program compiled without an error.Errors are as follows:


I suspect the array length is too large, or there is a problem with the way the.cu file is compiled. What’s even weirder is that this error still occurs when I only modify the array length to run correctly before it does.Because the error was vague, I couldn’t find a specific reason.The compilation method in cmake is as follows.

What is your system configuration?
OS version, installed GPU(s), VRAM amount, display driver version, OptiX major.minor.micro version, CUDA toolkit version used to generate the module inputs, host compiler version.

The answer to your topic’s title is: As much VRAM as is installed on your GPU. It depends on what you programmed.

The amount of memory really allocated for the OptiX kernel depends a lot on the required local memory and the OptiX stack size, which in turn depends on the ray tracing algorithms (traversal depth, max recursions, and how much memory the program domains use).

In your case your per ray payload structure is rather big with1460 bytes if I counted that correctly.
You define that inside the ray generation program. That means it’s allocated per per thread, means launch dimension size.
Then I assume you pass a pointer to that payload structure in two 32bit payload registers in optixTrace.

You should always calculate the OptiX stack size and look at the resulting parameters. Mind that there is a hard upper limit for the stack size which is 64kB.
Since that is allocated per thread as well, there can easily be some GBs used just by that.
Example code inside the OptiX SDK examples or here:
https://github.com/NVIDIA/OptiX_Apps/blob/master/apps/MDL_renderer/src/Device.cpp#L932

The general recommendation is to reduce the required stack size as much as possible for performance and memory reasons.
Use iterative instead of recursive algorithms.
Don’t define too much local data inside the individual program domains. (If you generate PTX code for the modules and not OptiX-IR, then you can look at the “local depot” sizes at the beginning of the programs.

That you get an an illegal memory access error instead of an out of memory error means that you accessed something out of bounds.
When changing the array sizes inside that Payload structure, have you made sure all other source code also accesses these arrays inside their bounds? (I would define that size 50 with a single define instead to make sure all code adheres to the sizes.)

All your elements inside the Payload structure have a CUDA memory alignment requirement of 4 bytes, so there shouldn’t be inadvertent padding inside that struct or misaligned memory accesses which would have reported such different error message.

Related threads to figure out how much memory is used overall:
https://forums.developer.nvidia.com/t/is-there-a-way-to-know-how-much-gpu-memory-optix-will-use/272769
https://forums.developer.nvidia.com/t/understanding-optix-internal-memory-use/276406

If you haven’t enabled OptiX validation mode when debugging this, here is example code doing that.
Check if there is additional information with that enabled.
https://forums.developer.nvidia.com/t/need-help-understanding-why-optixlaunch-is-failing/275372/4

(PS: To all readers: Please do not attach screenshots of source code or command prompts on developer forums.
Instead just copy and paste the text itself into a code block for better readability and easier handling.)

OS version: ubuntu 20.08
GPU: nvidia RTX 3090,24GB
CUDA version: 11.4
driver version: 470.141.03
Optix version:7.2

I have enabled Optix validation mode, additional information is as fllows:

[ 2][       ERROR]: Validation mode caught builtin exception OPTIX_EXCEPTION_CODE_STACK_OVERFLOW
Error recording resource event on user stream (CUDA error string: unspecified launch failure, CUDA error code: 719)
Optix call (optixLaunch( pipeline,stream, launchParamsBuffer.d_pointer(), launchParamsBuffer.sizeInBytes, &sbt, launchParams.frame.size.x, launchParams.frame.size.y, 1 )) failed with code 7053 (line 665)

Well, as predicted, somehow the stack size is too big inside your raytracing algorithm.

Did you calculate the OptiX stack size yourself?
https://raytracing-docs.nvidia.com/optix8/guide/index.html#program_pipeline_creation#pipeline-stack-size

If not, you should implement the necessary code to be able to call optixPipelineSetStackSize inside your code and see what the arguments are for your pipeline and algorithm.
Then you should analyze how the arguments change when adjusting your source code.

The above recommendations apply:

  • Reduce the required stack size as much as possible for performance and memory reasons.
  • Use iterative instead of recursive algorithms.
  • Don’t define too much local data inside the individual program domains.
  • If you generate PTX code for the modules and not OptiX-IR, then you can look at the “local depot” sizes at the beginning of the programs inside the PTX code to see how that changes when adjusting the code.

There had been some driver changes to make the stack size calculation inside OptiX more accurate, so I would really recommend updating the display drivers from that old 470 branch to something a lot more recent. Then you would also be able to use more current OptiX SDK versions than 7.2.0.

I don’t expect that to solve your issue automatically. That can only be done by changing your raytracing implementation.