Hi,
Background
When I enable debugging (-G) and some extra code on my end (think extra runtime checks), I’m able to reliably make my program encounter a “CUDA Exception: Warp Illegal Instruction”. Thing is, I don’t get at all how the instruction executed is an illegal instruction. I poked around for combinations of “malloc”, “call.abs.noinc”, “warp illegal instruction”, “cuda”, etc. but haven’t yet found anything helpful. I ran cuda-memcheck ./my_program
, and no errors were reported (not that I’m implying my code is bug-free – actually, I’m trying to find the source of a particular bug, but I’ve encountered the illegal instruction exception before introducing that particular bug).
Questions
First off, (1) which instruction is causing the error? Is it the one pointed to by the arrow (call.abs.noinc), or is it the instruction before it (mov r6, r4)?
Secondly, (2) if it’s the call instruction, what’s that formulation mean? I’d think that “abs” would be short for absolute value, and “inc” would be short for increment, but I don’t know how to combine those two concepts. If it’s the mov instruction, then I’m really lost, as the kernel has somewhere between 64 and 135 registers allocated to it.
Details
Below I provide a log. I can also generate the same error and equivalent information using Eclipse Nsight. This is on Oracle Linuxx8664, kernel 4.14.35-1844.3.2.el7uek.x86_64 with CUDA 10.1.
An elided version of the offending line of code is:
an_array_of_pointers[0] = (Foo *) malloc(sizeof(Foo) * (THRESHOLD_1 - THRESHOLD_0));
Thanks,
David
Log
[build]$ cuda-gdb ./my_program
NVIDIA (R) CUDA Debugger
10.1 release
Portions Copyright (C) 2007-2018 NVIDIA Corporation
GNU gdb (GDB) 7.12
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type “show copying”
and “show warranty” for details.
This GDB was configured as “x86_64-pc-linux-gnu”.
Type “show configuration” for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type “help”.
Type “apropos word” to search for commands related to “word”…
Reading symbols from ./my_program…done.
(cuda-gdb) run
Starting program: /home/…/my_program
[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib64/libthread_db.so.1”.
[New Thread 0x7fffef822700 (LWP 93948)]
[New Thread 0x7fffeefd0700 (LWP 93949)]
… program output …
CUDA Exception: Warp Illegal Instruction
Thread 4 “my_program” received signal CUDA_EXCEPTION_4, Warp Illegal Instruction.
[Switching focus to CUDA kernel 11, grid 42, block (22,0,0), thread (80,0,0), device 0, sm 44, warp 3, lane 16]
0x00000000016439c0 in Bar::insert (this=0xfffeffff62f9a620, cat_id=30730, site_id=0) at /home/…/bar.h:114
114
(cuda-gdb) info registers
R0 0x10000 65536
R1 0xffd708 16766728
R2 0xfffeffff -65537
R3 0x10000 65536
R4 0xfffeffff -65537
R5 0x10 16
R6 0x7128aa00 1898490368
R7 0x10000 65536
R8 0x0 0
R9 0x7ffc 32764
pc 0x16439c0 0x16439c0 <Bar::insert(int, int)+7104>
(cuda-gdb) disassemble 0x16439c0
Dump of assembler code for function Bar::insert(int, int):
…
0x00000000016439b0 <+7088>: CALL.ABS.NOINC 0x0
=> 0x00000000016439c0 <+7104>: MOV R6, R4
…
End of assembler dump.
(cuda-gdb)