If you are compiling your code with -G and using the autostep feature, the debugger should be able to identify both the line of CUDA C/C++ source code as well as the machine instruction, that caused the fault. Also be sure to compile for the architecture you are running on. To use this capability, refer to autostep in the cuda-gdb manual.
There’s no way to get PC information from compiler output. (?) The program counter is not known at compile time.
If you want to quickly localize a fault to a particular line of kernel code, I recommend this method:
https://stackoverflow.com/questions/27277365/unspecified-launch-failure-on-memcpy/27278218#27278218
using cuda-memcheck, rather than using cuda-gdb
Here’s an example session in cuda-gdb:
$ cat t271.cu
#include <stdio.h>
__global__ void kernel(int *data) {
int val = *(data);
printf("thread[%d].val = %d\n", threadIdx.x, val);
}
int main(){
kernel<<<1,32>>>(NULL);
cudaDeviceSynchronize();
}
$ nvcc -G -arch=sm_60 -o t271 t271.cu
$ cuda-gdb ./t271
NVIDIA (R) CUDA Debugger
10.0 release
Portions Copyright (C) 2007-2018 NVIDIA Corporation
GNU gdb (GDB) 7.12
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./t271...(no debugging symbols found)...done.
(cuda-gdb) set cuda memcheck on
(cuda-gdb) autostep t271.cu:4 for 5
No symbol table is loaded. Use the "file" command.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (t271.cu:4) pending.
Created autostep of length 5 lines
(cuda-gdb) run
Starting program: /home/user2/misc/t271
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7fffef882700 (LWP 4460)]
[New Thread 0x7fffef081700 (LWP 4461)]
Thread 1 "t271" received signal CUDA_EXCEPTION_1, Lane Illegal Address.
[Current focus set to CUDA kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 0, lane 0]
Autostep precisely caught exception at t271.cu:4 (0xccf768)
0x0000000000ccf770 in kernel<<<(1,1,1),(32,1,1)>>> (data=0x0) at t271.cu:4
4 int val = *(data);
(cuda-gdb) x/4i $pc-4
0xccf76c <kernel(int*)+364>: Cannot disassemble instruction
=> 0xccf770 <kernel(int*)+368>: LD.E R2, [R2], P0
0xccf778 <kernel(int*)+376>: MOV R2, R2
0xccf780 <kernel(int*)+384>:
(cuda-gdb) quit
A debugging session is active.
Inferior 1 [process 4449] will be killed.
Quit anyway? (y or n) y
$