Debugging a "Warp Illegal Instruction" in OpenACC Fortran

I have a large Fortran code in which we use OpenACC for GPU support. I compiled my code with the -g option, but -G doesn’t seem to work for nvfortran. If I run it in cuda-gdb it will terminate with:

CUDA Exception: Warp Illegal Instruction
The exception was triggered at PC 0x55e08f0

Thread 1 "fleur" received signal CUDA_EXCEPTION_4, Warp Illegal Instruction.
[Switching focus to CUDA kernel 19, grid 1082100, block (986,0,0), thread (0,0,0), device 0, sm 0, warp 19, lane 0]
0x00000000085a6650 in m_ylm_ylm4_ ()
(cuda-gdb) frame
#0  0x00000000085a6650 in m_ylm_ylm4_ ()

I am not sure how this can help me. I can’t get a line, the frame just points to a whole routine, which doesn’t contain any OpenACC.

How am I supposed to use this information? Is there anyway of accessing the CUDA kernels OpenACC generates? Can I get more detailed information on where the error occurs or which instruction is invalid? How do I properly debug issues within OpenACC? Can I get more detailed line information?

Correct, “-g” (lower-case) enables debugging symbols for both the host and device code. “-G” (upper-case) is an nvcc only flag.

Using cuda-gdb is somewhat hit or miss on it’s usefulness. Unfortunately it doesn’t know Fortran and OpenACC code undergoes a heavy transformation so can be difficult to interpret. Though I’ll often use it a first step in understanding what’s going on.

Here you’ve got a “Warp Illegal Instruction” which means “A thread within a warp has accessed an address that is outside the valid range of local or shared memory regions.” My best guess is that it’s a problem with the arguments that you’re passing into the routine or possibly how you’re accessing a module variable. If you can provide the source for this routine as well as the section of code that calls it, it might give more clues.

If you don’t know where the routine is being called from, you can try setting the environment variable “NV_ACC_NOTIFY=1”. This will have the OpenACC runtime print the names of the kernels as their being launched. Assuming the error occurs outside of cuda-gdb, the last one printed will be the one that’s causing the error. The compiler names the kernels “subroutine_name_line_no” so should be easy to find.

Is there anyway of accessing the CUDA kernels OpenACC generates?

The flag “-acc=gpu,keep” will keep the intermediary device LLVM code in a file named “filename.n001.gpu” For the CUDA code, you can use “-acc=gpu,keep,nonvvm” however the “nonvvm” sub-option is unsupported.