I am encountering issues like : “an illegal memory access was encountered”. With the help of CUDA_DEVICE_WAITS_ON_EXCEPTION=1 as described in (XID Errors :: GPU Deployment and Management Documentation) and cuda-gdb i am able to get more context of the issue.
I have two questions:
- Is there a way to programatically get more information about the actual error context from within the crashing process, for example:
"CUDA Exception: Warp Illegal Address
The exception was triggered at PC 0x2b8d78a555e0 (sikJNI.cu:5205)
0x00002b8d78a555f0 in _INTERNAL_53_tmpxft_00006ce0_00000000_10_sikJNI_compute_75_cpp1_ii_39a64ee6::xy_single (result=0x2b8fb6de21c0, ilcl=…, iz=0, h=0x2b935db49538, fan=2, c=0x2b8ec44000e8, closest_fan=0)
- Is there a way to save a crash dump in a way that is digestable by cuda-gdb for post mortem debugging?