Sometimes after a “sudo reboot”, nvgpu failed with this logs:
[Tue Mar 15 09:58:50 2022] nvgpu: 17000000.gv11b gk20a_ptimer_isr:50 [ERR] PRI timeout: ADR 0x0040a224 WRITE DATA 0x00000000
[Tue Mar 15 09:58:50 2022] nvgpu: 17000000.gv11b gk20a_ptimer_isr:56 [ERR] FECS_ERRCODE 0xbadf1020
[Tue Mar 15 09:58:50 2022] nvgpu: 17000000.gv11b gp10b_priv_ring_decode_error_code:79 [ERR] client timeout
[Tue Mar 15 09:58:50 2022] nvgpu: 17000000.gv11b gk20a_ptimer_isr:50 [ERR] PRI timeout: ADR 0x0010a548 READ DATA 0x00000000
[Tue Mar 15 09:58:50 2022] nvgpu: 17000000.gv11b gk20a_ptimer_isr:50 [ERR] PRI timeout: ADR 0x005032e4 WRITE DATA 0x00000800
[Tue Mar 15 09:58:50 2022] nvgpu: 17000000.gv11b gk20a_ptimer_isr:56 [ERR] FECS_ERRCODE 0xbadf1020
[Tue Mar 15 09:58:50 2022] nvgpu: 17000000.gv11b gp10b_priv_ring_decode_error_code:79 [ERR] client timeout
…
[Tue Mar 15 09:58:54 2022] nvgpu: 17000000.gv11b __nvgpu_timeout_expired_msg_cpu:94 [ERR] Timeout detected @ gr_gk20a_ctx_wait_ucode+0xa4/0x570 [nvgpu]
[Tue Mar 15 09:58:54 2022] nvgpu: 17000000.gv11b gr_gk20a_ctx_wait_ucode:528 [ERR] timeout waiting on mailbox=0 value=0x00000010
[Tue Mar 15 09:58:54 2022] nvgpu: 17000000.gv11b gk20a_fecs_dump_falcon_stats:129 [ERR] gr_fecs_os_r : 0
[Tue Mar 15 09:58:54 2022] nvgpu: 17000000.gv11b gk20a_fecs_dump_falcon_stats:131 [ERR] gr_fecs_cpuctl_r : 0x40
[Tue Mar 15 09:58:54 2022] nvgpu: 17000000.gv11b gk20a_fecs_dump_falcon_stats:133 [ERR] gr_fecs_idlestate_r : 0x1
[Tue Mar 15 09:58:54 2022] nvgpu: 17000000.gv11b gk20a_fecs_dump_falcon_stats:135 [ERR] gr_fecs_mailbox0_r : 0x0
[Tue Mar 15 09:58:54 2022] nvgpu: 17000000.gv11b gk20a_fecs_dump_falcon_stats:137 [ERR] gr_fecs_mailbox1_r : 0x0
[Tue Mar 15 09:58:54 2022] nvgpu: 17000000.gv11b gk20a_fecs_dump_falcon_stats:139 [ERR] gr_fecs_irqstat_r : 0x0
[Tue Mar 15 09:58:54 2022] nvgpu: 17000000.gv11b gk20a_fecs_dump_falcon_stats:141 [ERR] gr_fecs_irqmode_r : 0x4
[Tue Mar 15 09:58:54 2022] nvgpu: 17000000.gv11b gk20a_fecs_dump_falcon_stats:143 [ERR] gr_fecs_irqmask_r : 0x8705
[Tue Mar 15 09:58:54 2022] nvgpu: 17000000.gv11b gk20a_fecs_dump_falcon_stats:145 [ERR] gr_fecs_irqdest_r : 0x0
[Tue Mar 15 09:58:54 2022] nvgpu: 17000000.gv11b gk20a_fecs_dump_falcon_stats:147 [ERR] gr_fecs_debug1_r : 0x40
[Tue Mar 15 09:58:54 2022] nvgpu: 17000000.gv11b gk20a_fecs_dump_falcon_stats:149 [ERR] gr_fecs_debuginfo_r : 0x0
[Tue Mar 15 09:58:54 2022] nvgpu: 17000000.gv11b gk20a_fecs_dump_falcon_stats:151 [ERR] gr_fecs_ctxsw_status_1_r : 0x140
[Tue Mar 15 09:58:54 2022] nvgpu: 17000000.gv11b gk20a_fecs_dump_falcon_stats:155 [ERR] gr_fecs_ctxsw_mailbox_r(0) : 0x10
…
I don’t have this behaviour after an electric reboot, I mean after an unplug/plug electric power.
I’m using a Xavier NX with custom carrier board and emmc. Some informations:
$ uname -r
4.9.253-tegra
$ cat /etc/nv_tegra_release
# R32 (release), REVISION: 6.1, GCID: 27863751, BOARD: t186ref, EABI: aarch64, DATE: Mon Jul 26 19:36:31 UTC 2021
Thanks for your help.