Unable to disassemble Optix GPU code

I’m trying to debug a program which uses CUDA 11.5 and Optix 7.4 on a Fedora 35 Linux system with a RTX3060 card.
The program crashes and when I run it in cuda-gdb the error is

CUDA Exception: Warp Illegal Address
The exception was triggered at PC 0x7fffce30dd70

Thread 4 "GPUThread" received signal CUDA_EXCEPTION_14, Warp Illegal Address.
[Switching focus to CUDA kernel 0, grid 90, block (2714,0,0), thread (0,0,0), device 0, sm 0, warp 1, lane 0]
0x00007fffce30e0e0 in ??<<<(7680,9,1),(128,1,1)>>> ()

I tried disassembling the GPU code, and got the following output where disassembly does not work

(cuda-gdb) x/4i $pc-32
(cuda-gdb) p $pc
$1 = (void (*)()) 0x7fffce30e0e0
(cuda-gdb) x/4x $pc-32
0x7fffce30e0c0: 0xff4b7224      0x000000ff      0x078e0035      0x000fe200
(cuda-gdb) where
#0  0x00007fffce30e0e0 in ??<<<(7680,9,1),(128,1,1)>>> ()

How do I get the debugger to show me a disassembly of memory at the instruction counter?

Hi @drwootton1,

I assume this post is regarding the crash fixed in the OptiX thread? (Linux Optix 7.4 Debugging problem)

For posterity, OptiX internal engine code is proprietary and cannot be disassembled. A crash due to a bad or deleted BVH input pointer would certainly land in this category.

A good way to disassemble your shader programs and double-check whether it’s working is to use Nsight Compute, because it will automatically show you the disassembly of all the shader programs in your OptiX launch. Nsight Compute will also indicate the internal code as “NVIDIA Internal”, so this avenue is a bit more self-explanatory and less mysterious as to when/where/why disassembly will appear or not.


This was the current instance where I was trying to get useful info from cuda-gdb. I’ve had other problems where I was trying to use cuda-gdb and couldn’t get it to tell me anything useful.

I think my original case here when I couldn’t disassemble anything or get a function name was somewhere in the Optix Runtime and not my code. So I understand the proprietary code problem to some extent, but providing at least some info about where program execution was would be helpful. As it is I am debugging blind and limited to printfs and bisecting code.

I put an intentional null pointer dereference in my closest hit program, and I can disassemble at least code for the few instructions close to the actual trap. I can sometimes match that up to the source code in Nsight Compute if I can find a matching SASS code segment. The instruction addresses do not match up between the cuda-gdb disassembly and what I see in NSight Compute.

The other odd thing is when I disassemble code I seem to be only shown some of the code in the closest hit program. For instance, I put the null pointer dereference at the very end of my closest hit program and the disassembly only shows me the trap code sequence

Thread 4 "GPUThread" received signal SIGTRAP, Trace/breakpoint trap.
[Switching focus to CUDA kernel 0, grid 201, block (2923,1,0), thread (32,0,0), device 0, sm 0, warp 0, lane 0]
0x00007fffcdf0f310 in __closesthit__mirror_ptID_0xb6f80f34cd1d81d3_ss_3<<<(7680,9,1),(128,1,1)>>> ()
(cuda-gdb) disassemble
Dump of assembler code for function __closesthit__mirror_ptID_0xb6f80f34cd1d81d3_ss_3:
   0x00007fffcdf0f300 <+0>:     BPT.TRAP 0x1 
=> 0x00007fffcdf0f310 <+16>:    EXIT 
   0x00007fffcdf0f320 <+32>:    BRA 0x20
   0x00007fffcdf0f330 <+48>:    NOP

Also, if I have a line with a variable declaration and initialization, like

    const uint3 dim = optixGetLaunchDimensions();

then that generated code does not seem to show up in the cuda-gdb disassembly or the NSight Compute SASS code window.

FWIW, we agree 100%. Both the CUDA and OptiX teams are actively working to cover more and more of these cases in our tools. Stay tuned!

So I’m not sure about the closest-hit trap code, we’d maybe need to look at the source to understand this. The “_ss_3” suffix is indicating this disassembly is pointing at the block of code after the 3rd optixTrace() or continuation callable call, so this disassembly might be the end block of your closest-hit shader. (The numbered optixTrace()/callable calls can be the result of loop unrolling or function inlining, so it doesn’t necessarily correspond to a 3rd call in the source code…)

At least in Nsight Compute, you can examine your entire disassembly without needing to start from a numbered memory address. In the NSight Compute UI’s function drop-down you will find multiple “closesthit” functions including the blocks for “_ss_0”, “_ss_1”, and “_ss_2”. The OptiX compiler breaks apart your shaders in order to put everything together into a single larger rendering kernel, so that may be why it feels like you aren’t seeing the entire closest hit program. ​I find it much easier to explore the SASS using NCU than when using cuda-gdb, and when your symbols are turned on, the source-SASS association is very convenient and often quite good, given that inlining and multiple compilation passes is always making the association messy and difficult.

As for whether a function call like optixGetLaunchDimensions() will show up in the disassembly, that depends very much on the function and compiler and debug settings. Quite a few OptiX API device calls are compiled into a register access for a value that is already live, and so the only compiled code that could be associated might be when using/reading dim, which already gets associated with a different line of source. There is no reason to expect that when declaring a variable like dim that it necessarily compiles into any code at all. If the compiler can inline the call and avoid explicitly storing a variable on the stack, it most definitely will, and the result is that some lines of source code do not have any associated lines of SASS assembly. I can’t be certain without reproducing your environment, but I might expect this specific code to be one such example of code the compiler produces no direct output for.

I hope that at least, if nothing else, gives you a little sense of why there are quite a few cases where disassembly might be missing or confusing, and a little hope knowing the tooling is still improving. I know from experience it can be frustrating to debug when this happens. We are always happy to help & dive into a reproducer if you get stuck or the tools aren’t giving you what you need.


1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.