cudaDeviceSynchronize() call after optixLaunch(...) results in cudaErrorIllegalAddress


I tried the past days to integrate an optix example that uses all sorts of gdt::vectors into an existing OpenGL renderer that uses glm::vectors. I basically just changed the types of vectors and it runs without any error until the cudaDeviceSynchronize check right after the checked optixLaunch call, which returns a cudaErrorIllegalAddress.

Is there a way to find out what particular error causes this IllegalAddress error?

Kind regards and thank you


Hi Markus,

The very first thing I recommend is to run with the environment variable CUDA_LAUNCH_BLOCKING set to 1. This will implicate or rule-out async issues. If the error goes away, then everything is probably setup correctly and you just need to figure out where to synchronize.

Next I’d suggest putting a cudaDeviceSynchronize() call and error check before the optixLaunch to make sure the problem actually occurs during the launch and wasn’t carried over from before the launch.

The next thing to try is turn on OptiX validation mode, to see if it reports anything.

If you’re on Windows, you could try compiling everything in debug with optimizations disabled, and launch the Nsight Compute debugger from Visual Studio. We are working on the debugging features, and they’re not expected to work seamlessly right now, but you might be able to catch the approximate location of the illegal memory access, so it’s worth a shot.

I don’t know if gdt::vector and glm::vector are the same size in memory, but if so, then the vector type might be a red herring. Are they both equivalent to float3?

If the debugger doesn’t yield any useful info, the next thing to do is isolate the cause. Common causes for your setup might include:

  • a misconfigured Shader Binding Table
  • a shader reading/writing memory out of bounds
  • an OpenGL interop problem
  • a misconfigured Acceleration Structure
  • a stack overflow

Isolate the cause by disabling OptiX features in your renderer systematically until it runs without the error. (Or if it makes more sense, disable all features and re-enable them systematically until you hit the error.) For example, if you have multiple hitgroups, see if disabling one or more fixes the error. In that case the issue may be SBT or a shader program. To test shaders, comment the trace call(s) from your raygen program and see if the error still occurs (if so, the issue is in miss, closest-hit, or any-hit). Unplug things until you figure out which one is the problem. You can rule out shader code by putting a return statement at the top, and bisect it by moving the return statement around.

See if you can trigger the error with smaller launch dimensions. It is ideal if you can reproduce when using a 1x1 pixel launch. It may be effective to use printf() once your launch size is small enough that the amount of printf output is manageable.

See how far you get isolating, and if you get stuck we can toss out a few more ideas about how to dig further. I hope that helps!


Hello David,

thank you for your help. Actually, the validation mode was the necessary tip: it turned out that for some reason the resize method, which is responsible for creating the colour_buffer to which Optix should have been draw, wasn’t called, so Optix was writing to a nullptr. That was the first error and fixing it got also rid of the sync-check errors. Now, I only got issues transferring the colour_buffer into a OpenGL texture but that’s something I might find out myself - or at least I try first ;-)

Thank you again!


1 Like

Different methods how to transfer data from CUDA to OpenGL Textures can be found in my OptiX 7 examples.
Search the whole source code for the m_interop variable to find how the different methods setup the CUDA and OpenGL resources and either register an OpenGL PBO or an OpenGL texture image with CUDA.
Then these three cases show how to transfer data from OptiX/CUDA to and OpenGL texture:

That is showing float4 RGBA32F buffers.
Note the internal format GL_RGBA32F and user data format GL_RGBA, GL_FLOAT.
(Similar for half float: GL_RGBA16F, GL_RGBA, GL_HALF_FLOAT_ARB, shown in the intro_denoiser example.)

When using uchar4 render buffers I’d recommend to use the GL_RGBA8 internal format with the input user data being GL_BGRA, GL_UNSIGNED_BYTE to hit the fastest path.

Thank you, this would have been definitely a follow-up question at some point, so this is valuable information.

My issue yesterday was way simpler (I just mixed up format and type in the glTexImage2D-call, which I use for now). I’ll switch to the interop approach quite soon, though.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.