persistent kernel causes driver to complain "cannot idle engine 0" and then cease function

This happens with L4T R24.1 64-bit. I’m including a degenerate example of this problem, where the kernel does an infinite loop.

#include <unistd.h>
#include <sys/types.h>
#include <stdio.h>

__device__   int  done = 0;

__global__ void Kernel()
{
   while (!done) {
   }
}

int main(int argc, char** argv)
{
   printf("My pid is %d\n", getpid());
   Kernel<<<1, 1>>>();
   cudaError_t err = cudaGetLastError();
   if (err != cudaSuccess) {
      fprintf(stderr, "kernel launch failure: %s (%d)\n",
              cudaGetErrorString(err), err);
      exit(-1);
   }                                                             
   cudaThreadSynchronize();
   cudaThreadExit();
   return 0;
}
NVCC = /usr/local/cuda/bin/nvcc
NVCCGEN = -gencode=arch=compute_53,code=\"sm_53,compute_53\"
NVCCFLAGS = $(NVCCGEN) --compiler-options -fno-strict-aliasing -DUNIX -g -G -rdc=true

COMPILE.cu = $(NVCC) $(NVCCFLAGS) $(TARGET_ARCH) -c
NVLINK = $(NVCC) $(NVCCGEN) -rdc=true

it: it.o
	$(NVLINK) $^ $(LD35LIBS) -o $@

it.o: it.cu
	$(COMPILE.cu) -c $<

clean:
	rm -f it it.o

If you build and run the “it” program, the driver will start throwing out errors like this:

gk20a gpu.0: gr_gk20a_ctx_wait_ucode: timeout waiting on ucode response
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_os_r : 0
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_cpuctl_r : 0x40
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_idlestate_r : 0x1
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_mailbox0_r : 0x0
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_mailbox1_r : 0x0
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_irqstat_r : 0x0
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_irqmode_r : 0x4
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_irqmask_r : 0x8704
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_irqdest_r : 0x0
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_debug1_r : 0x40
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_debuginfo_r : 0x0
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(0) : 0x0
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(1) : 0x0
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(2) : 0x89
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(3) : 0x0
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(4) : 0x2000a0
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(5) : 0x0
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(6) : 0x0
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(7) : 0x0
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_engctl_r : 0x0
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_curctx_r : 0x0
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_nxtctx_r : 0x0
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_IMB : 0xbadfbadf
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_DMB : 0xbadfbadf
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_CSW : 0xbadfbadf
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_CTX : 0xbadfbadf
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_EXCI : 0xbadfbadf
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_PC : 0xbadfbadf
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_SP : 0xbadfbadf
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_PC : 0xbadfbadf
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_SP : 0xbadfbadf
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_PC : 0xbadfbadf
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_SP : 0xbadfbadf
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_PC : 0xbadfbadf
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_SP : 0xbadfbadf
... delay ...
gk20a gpu.0: gk20a_fifo_wait_engine_idle: cannot idle engine 0

If you then attempt to attach to the process using cuda-gdb’s attach command, the driver will go further into a death spiral, continuing with the periodic dumps, but now also throwing out errors like this:

gk20a gpu.0: gr_gk20a_exec_ctx_ops: unable to stop gr ctxsw
gk20a gpu.0: nvgpu_ioctl_channel_reg_ops: dbg regops failed
gk20a gpu.0: __locked_fifo_preempt: preempt TSG 0 timeout
...
gk20a gpu.0: gk20a_set_error_notifier: error notifier set to 8 for ch 504
gk20a gpu.0: gk20a_set_error_notifier: error notifier set to 8 for ch 503
gk20a gpu.0: gk20a_set_error_notifier: error notifier set to 8 for ch 502
gk20a gpu.0: gk20a_set_error_notifier: error notifier set to 8 for ch 501
gk20a gpu.0: gr_gk20a_ctx_wait_ucode: timeout waiting on ucode response
...
gk20a gpu.0: gk20a_fifo_handle_mmu_fault: mmu fault on engine 0, engine subid 0 
(gpc), client 1 (t1 0), addr 0x00000005:0x01481000, type 2 (pte), info 0x00000182,inst_ptr 0x15ed1d000
...
gk20a gpu.0: gk20a_fifo_handle_mmu_fault: gr_status_r : 0x1242081
...
gk20a gpu.0: gk20a_fifo_reset_engine: failed to HALT gr pipe
gk20a gpu.0: gk20a_fifo_set_ctx_mmu_error_tsg: TSG 0 generated a mmu fault
gk20a gpu.0: gk20a_fifo_handle_sched_error: fifo sched error : 0x0000000a, failed to find engine

At some point along the way, the problem becomes unrecoverable without a reboot.

Hi Todd,

To clarify your issue further, would you please help to confirm that you do not have CUDA_WAITS_ON_EXCEPTION=1 in your environment?

Besides, that “cuda-gdb –pid” aka “late attach” is not supported on mobile environments – this includes L4T, Android, V4L, and V4A.

Thanks

No, CUDA_WAITS_ON_EXCEPTION=1 is not set.

Hi Todd Allen,

By further clarification with the team, this seems to be an issue in nvgpu – channel watchdog is not initiating RC recovery properly and hangs the board, we will fix this issue in the coming release - r24.2.

Thanks

In L4T R24.2, this issue was obscured by a different bug that prohibited CUDA attach at all. With L4T R24.2.1, that problem is corrected, and this one resurfaces, but it’s less bad.

If you run the test program, the driver will issue this error on the console:
gk20a gpu.0: gk20a_fifo_wait_engine_idle: cannot idle engine 0
If you attempt to attach to the test program, it fails like this:
Error: Failed to suspend device for CUDA device 0, error=CUDBG_ERROR_INTERNAL(0xa).
If you exit cuda-gdb, and then kill the test program, you get errors like these:

__locked_fifo_preempt: preempt TSG 0 timeout

gk20a gpu.0: gk20a_set_error_notifier: error notifier set to 8 for ch 503
gk20a gpu.0: gk20a_set_error_notifier: error notifier set to 8 for ch 502
gk20a gpu.0: gk20a_set_error_notifier: error notifier set to 8 for ch 507
gk20a gpu.0: gk20a_set_error_notifier: error notifier set to 8 for ch 505
gk20a gpu.0: gk20a_fifo_handle_mmu_fault: mmu fault on engine 0, engine subid 0 (gpc), client 2 (pe 0), addr 0x00000015:0x44447000, type 11 (compression failure), info 0x0100220b,inst_ptr 0x7b0800000

gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_os_r : 0
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_cpuctl_r : 0x40
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_idlestate_r : 0x1
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_mailbox0_r : 0x0
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_mailbox1_r : 0x0
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_irqstat_r : 0x0
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_irqmode_r : 0x4
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_irqmask_r : 0x8704
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_irqdest_r : 0x0
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_debug1_r : 0x40
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_debuginfo_r : 0x0
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(0) : 0x0
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(1) : 0x1
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(2) : 0x50009
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(3) : 0x20
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(4) : 0x2000a0
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(5) : 0x0
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(6) : 0x0
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(7) : 0x0
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_engctl_r : 0x0
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_curctx_r : 0x0
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: gr_fecs_nxtctx_r : 0x0
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_IMB : 0xbadfbadf
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_DMB : 0xbadfbadf
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_CSW : 0xbadfbadf
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_CTX : 0xbadfbadf
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_EXCI : 0xbadfbadf
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_PC : 0xbadfbadf
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_SP : 0xbadfbadf
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_PC : 0xbadfbadf
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_SP : 0xbadfbadf
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_PC : 0xbadfbadf
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_SP : 0xbadfbadf
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_PC : 0xbadfbadf
gk20a gpu.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_SP : 0xbadfbadf
gk20a gpu.0: gk20a_fifo_handle_mmu_fault: gr_status_r : 0x1000081
gk20a gpu.0: gk20a_fifo_set_ctx_mmu_error_tsg: TSG 0 generated a mmu fault
gk20a gpu.0: gk20a_fifo_handle_sched_error: fifo sched error : 0x0000000a, failed to find engine

So that much is similar to what is described above. But, on the plus side, the driver seems to recover and a reboot is not necessary.

Hi Todd Allen,

The cuda-gdb does not support late attach on mobile, we plan to remedy this in a future release.

Thanks