Illegal instruction error, but the instruction seems to be okay?

Hi,

Background

When I enable debugging (-G) and some extra code on my end (think extra runtime checks), I’m able to reliably make my program encounter a “CUDA Exception: Warp Illegal Instruction”. Thing is, I don’t get at all how the instruction executed is an illegal instruction. I poked around for combinations of “malloc”, “call.abs.noinc”, “warp illegal instruction”, “cuda”, etc. but haven’t yet found anything helpful. I ran cuda-memcheck ./my_program, and no errors were reported (not that I’m implying my code is bug-free – actually, I’m trying to find the source of a particular bug, but I’ve encountered the illegal instruction exception before introducing that particular bug).

Questions
First off, (1) which instruction is causing the error? Is it the one pointed to by the arrow (call.abs.noinc), or is it the instruction before it (mov r6, r4)?

Secondly, (2) if it’s the call instruction, what’s that formulation mean? I’d think that “abs” would be short for absolute value, and “inc” would be short for increment, but I don’t know how to combine those two concepts. If it’s the mov instruction, then I’m really lost, as the kernel has somewhere between 64 and 135 registers allocated to it.

Details

Below I provide a log. I can also generate the same error and equivalent information using Eclipse Nsight. This is on Oracle Linuxx8664, kernel 4.14.35-1844.3.2.el7uek.x86_64 with CUDA 10.1.

An elided version of the offending line of code is:

an_array_of_pointers[0] = (Foo *) malloc(sizeof(Foo) * (THRESHOLD_1 - THRESHOLD_0));

Thanks,
David

Log

[build]$ cuda-gdb ./my_program
NVIDIA (R) CUDA Debugger
10.1 release
Portions Copyright (C) 2007-2018 NVIDIA Corporation
GNU gdb (GDB) 7.12
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type “show copying”
and “show warranty” for details.
This GDB was configured as “x86_64-pc-linux-gnu”.
Type “show configuration” for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type “help”.
Type “apropos word” to search for commands related to “word”…
Reading symbols from ./my_program…done.
(cuda-gdb) run
Starting program: /home/…/my_program
[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib64/libthread_db.so.1”.
[New Thread 0x7fffef822700 (LWP 93948)]
[New Thread 0x7fffeefd0700 (LWP 93949)]
… program output …
CUDA Exception: Warp Illegal Instruction

Thread 4 “my_program” received signal CUDA_EXCEPTION_4, Warp Illegal Instruction.
[Switching focus to CUDA kernel 11, grid 42, block (22,0,0), thread (80,0,0), device 0, sm 44, warp 3, lane 16]
0x00000000016439c0 in Bar::insert (this=0xfffeffff62f9a620, cat_id=30730, site_id=0) at /home/…/bar.h:114
114

(cuda-gdb) info registers
R0 0x10000 65536
R1 0xffd708 16766728
R2 0xfffeffff -65537
R3 0x10000 65536
R4 0xfffeffff -65537
R5 0x10 16
R6 0x7128aa00 1898490368
R7 0x10000 65536
R8 0x0 0
R9 0x7ffc 32764

pc 0x16439c0 0x16439c0 <Bar::insert(int, int)+7104>

(cuda-gdb) disassemble 0x16439c0
Dump of assembler code for function Bar::insert(int, int):

0x00000000016439b0 <+7088>: CALL.ABS.NOINC 0x0
=> 0x00000000016439c0 <+7104>: MOV R6, R4

End of assembler dump.
(cuda-gdb)

It may be worth noting that if the instruction pointer is pointing to the next instruction to execute and not the last instruction executed, that it’s possible (though unlikely) that an instruction somewhere else jumped to 0x16439c0. I think it’s unlikely, because the stack backtrace looks appropriate.

I’ve just started receiving:

========= Illegal instruction
========= at __cuda_sm70_votesync_ballot+0x50
========= by thread (0,0,0) in block (36,0,0)
========= Device Frame:__ballot_sync(unsigned int, int)+0x150 in /usr/local/cuda/include/sm_30_intrinsics.hpp:134

For a __ballot_sync when setting -G in cuda 12.4. It didn’t give me this error in cuda 12.3 with -G and it works fine in either case when -G is not specified. I’ve checked that all threads in the warp call it, and that the mask is correct, also the code works properly when I don’t set -G. To me this looks like a bug in cuda 12.4.

HI @andrewrobbins
Thank you very much for reporting the issue. Could you please confirm a few details?

  • Do you see this error only when running the application under debugger (cuda-gdb) or is it reproducible without the debugger as well?
  • Could you share the output of the following commands (for the setup, where you see this issue):
    • nvcc --version
    • nvidia-smi

Hi, @andrewrobbins

Could you provide the repro and the exactly nvcc/driver version you used ?