Debugging optixLaunch error: illegal memory access

After some refactoring of my OptiX 7.0 program, i keep getting the error: “an illegal memory access was encountered” when calling optixLaunch.

I have tried creating a pipeline with an empty raygen program, and the error persits, so i know it is not some error in my CUDA code.

I have spent quite some time already to try and debug this, but i just cant seem to make the error go away.

Running cuda-memcheck gives the following error:

========= Host API memory access error at host access to 0x7f6885e09cf8 of size 1164 bytes
=========     Invalid range on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoDAsync_v2 + 0x219) [0x293d59]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.440.26 [0x14b2bb]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.440.26 [0x14bd0d]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libnvoptix.so.1 [0x167998]
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libnvoptix.so.1 [0x72c875]
=========     Host Frame:godray.8447d32d/godray [0x13f00]
=========     Host Frame:godray.8447d32d/godray [0xfabb]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:godray.8447d32d/godray [0x1068a]

To my understanding, as i am launching with an empty raygen program, the error must be with either the pipeline or the SBT i am passing as to the optixLaunch call. But i am having a lot of trouble locating what the error is exactly.

How should i best proceed to debug the error, and what could potentially be the cause?

I ran cuda-gbd on the program, which gave the following error:

CUDA Exception: Warp Illegal Address
The exception was triggered at PC 0x5555580ccd30

Thread 1 "godray" received signal CUDA_EXCEPTION_14, Warp Illegal Address.
[Switching focus to CUDA kernel 0, grid 48, block (0,0,0), thread (32,0,0), device 0, sm 0, warp 2, lane 0]
0x00005555580ccd60 in ??<<<(128,128,1),(64,1,1)>>> ()

Hi Mattivc,

It’s hard to say from the error messages, but if you have a small, complete sample I could try to run it.

The first error said “Invalid range on access by cudaMemcopy source.” so I would maybe start by look at the host code for that one. It might be a problem before launch that doesn’t get caught until launch time.

Be aware that older versions of cuda-memcheck don’t actually work with OptiX, they will report incorrect memory access errors because OptiX is not always ABI compatible with CUDA. I heard that OptiX support was added to cuda-memcheck recently, but this morning I’m having trouble getting an answer which versions are expected to work. You might be able to tell by running cuda-memcheck on a working OptiX SDK sample – if it fails then you know your cuda-memcheck doesn’t support OptiX.

cuda-gdb does have OptiX support, but you do have to make sure you’re on a very recent driver with the latest version of cuda-gdb. I don’t immediately know all the reasons you can get a Warp Illegal Address. I’d recommend reviewing all of your module and pipeline options first, compare it to a working OptiX SDK sample. If you don’t catch it, feel free to send me a reproducer.


David.

Maybe check if its not related to the bug in the OptiX 7 SDK I’ve found

https://devtalk.nvidia.com/default/topic/1072808/optix/-bugreport-amp-fix-optix-7-corrupts-cudeviceptr-in-the-sbt-due-to-truncation-hardcore-/?offset=2#5434966