nvrtc + cuobjdump/nvdisasm

i can easily export CUDA and PTX when using nvrtc (i.e. the inputs and outputs of the nvrtc compilation), but i can’t seem to figure out a nice way to get the .cubin and/or view the disassembly for nvrtc-compiled functions after loading them with cuModuleLoadDataEx().

i can see the SASS using nvvp, but that seems a bit slow/cumbersome. i’d guess maybe the debugger could work similarly, but i didn’t try it.

i can also use ptxas on the .ptx emitted from nvrtc:

ptxas out.ptx -arch sm_52 -o out.cubin ; nvdisasm out.cubin

this seems like a decent approach, however, i worry that this might not consistently yield the same code as when i use cuModuleLoadDataEx()

further, it’s perhaps somewhat against the spirit of using nvrtc; otherwise i could be os.system()'ing nvcc in the first place :/ … although admittedly i don’t really see a non-devel/debugging use-case for needing to alter/inspect the .cubin/dissassembly at run-time.

anyone have any clues on other official/unofficial methods for getting access to .cubin after calling cuModuleLoadDataEx(), or other general flows for this sort of debugging/inspection of SASS?


Unofficial method: Clear the JIT cache directory. Load and JIT compile your code. Now grab the JIT cache content, strip everything before “ELF” from each file, then send the resulting binary through cuobjdump. I have used this method in the past when code generation bugs were not reproducible with the offline compiler but only with the JIT compiler (at the time the PTXAS in the driver wasn’t always in full sync with the PTXAS in the offline compiler). I have not used it recently, but I would think it still works.