Making use of an error message

when I run my executable cuda file in cuda-gdb, I get error:

CUDA Exception: Warp Illegal Address
The exception was triggered at PC 0x2416df0

My interpretation of the error message was: the instruction at PC 0x2376df0 caused some kind of error.

So to find out what the instruction is, I created an assembly file of the executable:

cuobjdump test -sass &> test.ptx

but no where in the assembly file could I find the address

0x2376df0

. For example,

/*0008*/                   MOV R1, c[0x0][0x20] ;                  /* 0x5c98078000770001 */
/*0010*/                   S2R R2, SR_CTAID.X ;                    /* 0xf0c9000002570002 */
/*0018*/                   S2R R0, SR_TID.X ;                      /* 0xf0c9000002170000 */

the left most column seems to be the relative address of instruction, starting from 0 at the start of the file. The right most column
seems to be an absolute address of the instructions. So where is

0x2416df0

? How can I find the instruction and make use of the error message I am getting? thanks

  • the address values are modified slightly because who knows what you geniuses will do with the slightest of hints about my system.

The right-most column in the dump-sass listing is actually the 64-bit instruction encoding. My guess here is the starting address of the code at run time is aligned to a 4K page boundary, e.g. 0x2416000, so you probably would want to look at the instruction at offset 0xdf0 in your listing.

I haven’t used the debugger in a long time, but I would assume there is a way for it to identify the line in the CUDA source code that triggers the exception, provided you use a debug build for debugging.

If you are compiling your code with -G and using the autostep feature, the debugger should be able to identify both the line of CUDA C/C++ source code as well as the machine instruction, that caused the fault. Also be sure to compile for the architecture you are running on. To use this capability, refer to autostep in the cuda-gdb manual.

There’s no way to get PC information from compiler output. (?) The program counter is not known at compile time.

If you want to quickly localize a fault to a particular line of kernel code, I recommend this method:

https://stackoverflow.com/questions/27277365/unspecified-launch-failure-on-memcpy/27278218#27278218

using cuda-memcheck, rather than using cuda-gdb

Here’s an example session in cuda-gdb:

$ cat t271.cu
#include <stdio.h>

__global__ void kernel(int *data) {
    int val  = *(data);
    printf("thread[%d].val = %d\n", threadIdx.x, val);
}

int main(){
  kernel<<<1,32>>>(NULL);
  cudaDeviceSynchronize();
}
$ nvcc -G -arch=sm_60 -o t271 t271.cu
$ cuda-gdb ./t271
NVIDIA (R) CUDA Debugger
10.0 release
Portions Copyright (C) 2007-2018 NVIDIA Corporation
GNU gdb (GDB) 7.12
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./t271...(no debugging symbols found)...done.
(cuda-gdb) set cuda memcheck on
(cuda-gdb) autostep t271.cu:4 for 5
No symbol table is loaded.  Use the "file" command.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (t271.cu:4) pending.
Created autostep of length 5 lines
(cuda-gdb) run
Starting program: /home/user2/misc/t271
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7fffef882700 (LWP 4460)]
[New Thread 0x7fffef081700 (LWP 4461)]

Thread 1 "t271" received signal CUDA_EXCEPTION_1, Lane Illegal Address.
[Current focus set to CUDA kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 0, lane 0]
Autostep precisely caught exception at t271.cu:4 (0xccf768)
0x0000000000ccf770 in kernel<<<(1,1,1),(32,1,1)>>> (data=0x0) at t271.cu:4
4           int val  = *(data);
(cuda-gdb) x/4i $pc-4
   0xccf76c <kernel(int*)+364>: Cannot disassemble instruction
=> 0xccf770 <kernel(int*)+368>: LD.E R2, [R2], P0
   0xccf778 <kernel(int*)+376>: MOV R2, R2
   0xccf780 <kernel(int*)+384>:
(cuda-gdb) quit
A debugging session is active.

        Inferior 1 [process 4449] will be killed.

Quit anyway? (y or n) y
$

Does compiling with -lineinfo help in any way in this situation?

-lineinfo and -G are mutually exclusive, use one or the other

compiling with -lineinfo instead of -G means you won’t have a full set of device debug symbols when debugging your code. You can try it if you wish, but IMO that would generally be a step backwards.

Thanks for the clarification.