Nvgpu failed at start up: FECS_ERRCODE 0xbadf1020

Sometimes after a “sudo reboot”, nvgpu failed with this logs:

[Tue Mar 15 09:58:50 2022] nvgpu: 17000000.gv11b gk20a_ptimer_isr:50 [ERR] PRI timeout: ADR 0x0040a224 WRITE DATA 0x00000000
[Tue Mar 15 09:58:50 2022] nvgpu: 17000000.gv11b gk20a_ptimer_isr:56 [ERR] FECS_ERRCODE 0xbadf1020
[Tue Mar 15 09:58:50 2022] nvgpu: 17000000.gv11b gp10b_priv_ring_decode_error_code:79 [ERR] client timeout
[Tue Mar 15 09:58:50 2022] nvgpu: 17000000.gv11b gk20a_ptimer_isr:50 [ERR] PRI timeout: ADR 0x0010a548 READ DATA 0x00000000
[Tue Mar 15 09:58:50 2022] nvgpu: 17000000.gv11b gk20a_ptimer_isr:50 [ERR] PRI timeout: ADR 0x005032e4 WRITE DATA 0x00000800
[Tue Mar 15 09:58:50 2022] nvgpu: 17000000.gv11b gk20a_ptimer_isr:56 [ERR] FECS_ERRCODE 0xbadf1020
[Tue Mar 15 09:58:50 2022] nvgpu: 17000000.gv11b gp10b_priv_ring_decode_error_code:79 [ERR] client timeout

[Tue Mar 15 09:58:54 2022] nvgpu: 17000000.gv11b __nvgpu_timeout_expired_msg_cpu:94 [ERR] Timeout detected @ gr_gk20a_ctx_wait_ucode+0xa4/0x570 [nvgpu]
[Tue Mar 15 09:58:54 2022] nvgpu: 17000000.gv11b gr_gk20a_ctx_wait_ucode:528 [ERR] timeout waiting on mailbox=0 value=0x00000010
[Tue Mar 15 09:58:54 2022] nvgpu: 17000000.gv11b gk20a_fecs_dump_falcon_stats:129 [ERR] gr_fecs_os_r : 0
[Tue Mar 15 09:58:54 2022] nvgpu: 17000000.gv11b gk20a_fecs_dump_falcon_stats:131 [ERR] gr_fecs_cpuctl_r : 0x40
[Tue Mar 15 09:58:54 2022] nvgpu: 17000000.gv11b gk20a_fecs_dump_falcon_stats:133 [ERR] gr_fecs_idlestate_r : 0x1
[Tue Mar 15 09:58:54 2022] nvgpu: 17000000.gv11b gk20a_fecs_dump_falcon_stats:135 [ERR] gr_fecs_mailbox0_r : 0x0
[Tue Mar 15 09:58:54 2022] nvgpu: 17000000.gv11b gk20a_fecs_dump_falcon_stats:137 [ERR] gr_fecs_mailbox1_r : 0x0
[Tue Mar 15 09:58:54 2022] nvgpu: 17000000.gv11b gk20a_fecs_dump_falcon_stats:139 [ERR] gr_fecs_irqstat_r : 0x0
[Tue Mar 15 09:58:54 2022] nvgpu: 17000000.gv11b gk20a_fecs_dump_falcon_stats:141 [ERR] gr_fecs_irqmode_r : 0x4
[Tue Mar 15 09:58:54 2022] nvgpu: 17000000.gv11b gk20a_fecs_dump_falcon_stats:143 [ERR] gr_fecs_irqmask_r : 0x8705
[Tue Mar 15 09:58:54 2022] nvgpu: 17000000.gv11b gk20a_fecs_dump_falcon_stats:145 [ERR] gr_fecs_irqdest_r : 0x0
[Tue Mar 15 09:58:54 2022] nvgpu: 17000000.gv11b gk20a_fecs_dump_falcon_stats:147 [ERR] gr_fecs_debug1_r : 0x40
[Tue Mar 15 09:58:54 2022] nvgpu: 17000000.gv11b gk20a_fecs_dump_falcon_stats:149 [ERR] gr_fecs_debuginfo_r : 0x0
[Tue Mar 15 09:58:54 2022] nvgpu: 17000000.gv11b gk20a_fecs_dump_falcon_stats:151 [ERR] gr_fecs_ctxsw_status_1_r : 0x140
[Tue Mar 15 09:58:54 2022] nvgpu: 17000000.gv11b gk20a_fecs_dump_falcon_stats:155 [ERR] gr_fecs_ctxsw_mailbox_r(0) : 0x10

I don’t have this behaviour after an electric reboot, I mean after an unplug/plug electric power.

I’m using a Xavier NX with custom carrier board and emmc. Some informations:

$ uname -r
4.9.253-tegra

$ cat /etc/nv_tegra_release
# R32 (release), REVISION: 6.1, GCID: 27863751, BOARD: t186ref, EABI: aarch64, DATE: Mon Jul 26 19:36:31 UTC 2021

Thanks for your help.

Do you have any application running before hitting this issue?

Do you have a devkit to verify/reproduce this issue?

Do you have any application running before hitting this issue?
I have 2 Python applications as service and starting “After=network-online.target”

Do you have a devkit to verify/reproduce this issue?
Yes, I have tested on devkit and there is not issue.

Is it a problem about reboot (sudo reboot) due to our custom carrier board?

Thanks for your help

Not sure if it’s caused by application you installed, pleae try with clean build first.

It could be, but need more clue to confirm.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.