Gk20a and Jetson Nano crash

Many thanks for this summary.

So only B01 board has the pcie error, right?
Does every of them on the list hit the gpu error?

Hi @WayneWWW,

I took some time to get back to you because I wanted to be relatively sure that the problem was solved.
I have tried to run the following configurations

Date Seconds Hours Model Boot Pwer Supply PCIe error What new
17/09/20 82655 22 A02 SD 5V, 4A No JetsonClocks activated, NO CRASH
21/09/20 110970 31 A02 USB3.0 SSD 5V, 4A No JetsonClocks activated, NO CRASH

I have recorded no crash with the jetsonclocks activated!
That solved the problem apparently.
The temperature of the A0 sensor is also lower (35°C on average now).

Also in the previous table, when you read NO CRASH that means that I had no problem. Unfortunately, that happened only when I had NO inference running before discovering that jetsonclocks was solving the problem.

Do you think that the dynamic management of the GPU voltage due to Dynamic Voltage and Frequency Scaling (DVFS) might have created such a malfunction after long usage?

Thanks!!

1 Like

Hi @WayneWWW,

We encountered the same problem. The PCIe error:

[  343.992555] nvmap_alloc_handle: PID 8709: deepstream-app: WARNING: All NvMap Allocations must have a tag to identify the subsystem allocating memory.Please pass the tag to the API call NvRmMemHanldeAllocAttr() or relevant.
[183917.087822] pcieport 0000:00:02.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0010(Receiver ID)
[183917.098132] pcieport 0000:00:02.0:   device [10de:0faf] error status/mask=00000001/00002000
[183917.106600] pcieport 0000:00:02.0:    [ 0] Receiver Error         (First)
[281785.725034] pcieport 0000:00:02.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0010(Receiver ID)
[281785.735398] pcieport 0000:00:02.0:   device [10de:0faf] error status/mask=00000001/00002000
[281785.743889] pcieport 0000:00:02.0:    [ 0] Receiver Error         (First)
[282799.166960] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
[282799.174265] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G         C      4.9.140-l4t-r32.4.2 #1
[282799.182597] Hardware name: NVIDIA Jetson Nano Developer Kit (DT)
[282799.188675] Call trace:
[282799.191206] [<ffffff800808c490>] dump_backtrace+0x0/0x1b0
[282799.196680] [<ffffff800808c664>] show_stack+0x24/0x30
[282799.201807] [<ffffff800843cd40>] dump_stack+0x98/0xc0
[282799.206935] [<ffffff80081b7dcc>] panic+0x12c/0x290
[282799.211801] [<ffffff8008178d50>] watchdog_nmi_enable+0x0/0x60
[282799.217620] [<ffffff8008177e84>] watchdog_timer_fn+0x8c/0x2a0
[282799.223439] [<ffffff800813b038>] __hrtimer_run_queues+0x120/0x388
[282799.229603] [<ffffff800813b9c8>] hrtimer_interrupt+0xa8/0x1d8
[282799.235423] [<ffffff8008bbc888>] tegra210_timer_isr+0x38/0x48
[282799.241243] [<ffffff80081233d8>] __handle_irq_event_percpu+0x60/0x280
[282799.247755] [<ffffff8008123638>] handle_irq_event_percpu+0x40/0x98
[282799.254007] [<ffffff80081236e0>] handle_irq_event+0x50/0x80
[282799.259653] [<ffffff800812768c>] handle_fasteoi_irq+0xc4/0x1a0
[282799.265558] [<ffffff800812240c>] generic_handle_irq+0x34/0x50
[282799.271376] [<ffffff8008122adc>] __handle_domain_irq+0x6c/0xc0
[282799.277280] [<ffffff8008080db4>] gic_handle_irq+0x54/0xa8
[282799.282752] [<ffffff8008082c28>] el1_irq+0xe8/0x194
[282799.287704] [<ffffff8008b62bf0>] cpuidle_enter_state+0xb8/0x380
[282799.293696] [<ffffff8008b62f2c>] cpuidle_enter+0x34/0x48
[282799.299082] [<ffffff8008113034>] call_cpuidle+0x44/0x68
[282799.304380] [<ffffff8008113374>] cpu_startup_entry+0x18c/0x210
[282799.310286] [<ffffff8008092a24>] secondary_start_kernel+0x13c/0x160
[282799.316624] [<00000000841511a4>] 0x841511a4
[282799.320884] SMP: stopping secondary CPUs
[282800.383788] SMP: failed to stop secondary CPUs 0,3
[282800.388658] Kernel Offset: disabled
[282800.392223] Memory Limit: none
[282800.404857] Rebooting in 1 seconds..
[282801.408567] SMP: stopping secondary CPUs
[282802.469903] SMP: failed to stop secondary CPUs 0,3

I’m not always getting the kernel panic. When the device keeps running and I restart the deepstream application I see following error:

[500510.539999] nvmap_alloc_handle: PID 31476: deepstream-app: WARNING: All NvMap Allocations must have a tag to identify the subsystem allocating memory.Please pass the tag to the API call NvRmMemHanldeAllocAttr() or relevant. 
[500512.260472] nvgpu: 57000000.gpu        gk20a_gr_handle_fecs_error:5281 [ERR]  fecs watchdog triggered for channel 507, cannot ctxsw anymore !!
[500512.273460] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:129  [ERR]  gr_fecs_os_r : 0
[500512.282369] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:131  [ERR]  gr_fecs_cpuctl_r : 0x40
[500512.291864] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:133  [ERR]  gr_fecs_idlestate_r : 0x1
[500512.301491] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:135  [ERR]  gr_fecs_mailbox0_r : 0x1
[500512.310943] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:137  [ERR]  gr_fecs_mailbox1_r : 0x0
[500512.320428] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:139  [ERR]  gr_fecs_irqstat_r : 0x0
[500512.329806] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:141  [ERR]  gr_fecs_irqmode_r : 0x4
[500512.339228] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:143  [ERR]  gr_fecs_irqmask_r : 0x8704
[500512.348900] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:145  [ERR]  gr_fecs_irqdest_r : 0x0
[500512.358284] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:147  [ERR]  gr_fecs_debug1_r : 0x40
[500512.367812] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:149  [ERR]  gr_fecs_debuginfo_r : 0x0
[500512.377414] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:151  [ERR]  gr_fecs_ctxsw_status_1_r : 0xb04
[500512.387700] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:155  [ERR]  gr_fecs_ctxsw_mailbox_r(0) : 0x4
[500512.397862] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:155  [ERR]  gr_fecs_ctxsw_mailbox_r(1) : 0x0
[500512.407998] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:155  [ERR]  gr_fecs_ctxsw_mailbox_r(2) : 0x50009
[500512.418467] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:155  [ERR]  gr_fecs_ctxsw_mailbox_r(3) : 0x4000
[500512.428875] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:155  [ERR]  gr_fecs_ctxsw_mailbox_r(4) : 0x1ffda0
[500512.439463] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:155  [ERR]  gr_fecs_ctxsw_mailbox_r(5) : 0x0
[500512.449597] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:155  [ERR]  gr_fecs_ctxsw_mailbox_r(6) : 0x0
[500512.459707] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:155  [ERR]  gr_fecs_ctxsw_mailbox_r(7) : 0x0
[500512.469823] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:155  [ERR]  gr_fecs_ctxsw_mailbox_r(8) : 0x0
[500512.479988] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:155  [ERR]  gr_fecs_ctxsw_mailbox_r(9) : 0x0
[500512.490101] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:155  [ERR]  gr_fecs_ctxsw_mailbox_r(10) : 0x0
[500512.500305] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:155  [ERR]  gr_fecs_ctxsw_mailbox_r(11) : 0x3
[500512.510565] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:155  [ERR]  gr_fecs_ctxsw_mailbox_r(12) : 0x0
[500512.520772] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:155  [ERR]  gr_fecs_ctxsw_mailbox_r(13) : 0x0
[500512.531006] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:155  [ERR]  gr_fecs_ctxsw_mailbox_r(14) : 0x0
[500512.541219] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:155  [ERR]  gr_fecs_ctxsw_mailbox_r(15) : 0x0
[500512.551508] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:159  [ERR]  gr_fecs_engctl_r : 0x0
[500512.560770] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:161  [ERR]  gr_fecs_curctx_r : 0x0
[500512.570019] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:163  [ERR]  gr_fecs_nxtctx_r : 0x0
[500512.579262] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:169  [ERR]  FECS_FALCON_REG_IMB : 0xbadfbadf
[500512.589375] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:175  [ERR]  FECS_FALCON_REG_DMB : 0xbadfbadf
[500512.599480] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:181  [ERR]  FECS_FALCON_REG_CSW : 0xbadfbadf
[500512.609590] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:187  [ERR]  FECS_FALCON_REG_CTX : 0xbadfbadf
[500512.619708] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:193  [ERR]  FECS_FALCON_REG_EXCI : 0xbadfbadf
[500512.629905] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:200  [ERR]  FECS_FALCON_REG_PC : 0xbadfbadf
[500512.639958] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:206  [ERR]  FECS_FALCON_REG_SP : 0xbadfbadf
[500512.650032] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:200  [ERR]  FECS_FALCON_REG_PC : 0xbadfbadf
[500512.660061] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:206  [ERR]  FECS_FALCON_REG_SP : 0xbadfbadf
[500512.670123] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:200  [ERR]  FECS_FALCON_REG_PC : 0xbadfbadf
[500512.680146] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:206  [ERR]  FECS_FALCON_REG_SP : 0xbadfbadf
[500512.690177] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:200  [ERR]  FECS_FALCON_REG_PC : 0xbadfbadf
[500512.700214] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:206  [ERR]  FECS_FALCON_REG_SP : 0xbadfbadf

Restarted application again:

[500515.004001] nvgpu: 57000000.gpu   nvgpu_set_error_notifier_locked:137  [ERR]  error notifier set to 8 for ch 507
[500515.014522] nvgpu: 57000000.gpu   nvgpu_set_error_notifier_locked:137  [ERR]  error notifier set to 8 for ch 506
[500515.024836] nvgpu: 57000000.gpu   nvgpu_set_error_notifier_locked:137  [ERR]  error notifier set to 8 for ch 505
[500515.035135] nvgpu: 57000000.gpu   nvgpu_set_error_notifier_locked:137  [ERR]  error notifier set to 8 for ch 504
[500515.045440] nvgpu: 57000000.gpu     gk20a_fifo_handle_sched_error:2531 [ERR]  fifo sched ctxsw timeout error: engine=0, tsg=4, ms=3100
[500515.057902] ---- mlocks ----

[500515.062513] ---- syncpts ----
[500515.065610] id 1 (disp0_a) min 1 max 1 refs 1 (previous client : )
[500515.071915] id 2 (disp0_b) min 1 max 1 refs 1 (previous client : )
[500515.078223] id 3 (disp0_c) min 1 max 1 refs 1 (previous client : )
[500515.084530] id 7 (54340000.vic_0) min 38558471 max 38558471 refs 1 (previous client : 54340000.vic_0)
[500515.093880] id 8 (gm20b_507) min 263544 max 263550 refs 1 (previous client : gm20b_507)
[500515.102010] id 9 (gm20b_506) min 284660 max 284662 refs 1 (previous client : gm20b_506)
[500515.110146] id 11 (gm20b_505) min 190842 max 190842 refs 1 (previous client : gm20b_505)
[500515.118358] id 12 (gm20b_504) min 30917106 max 30917106 refs 1 (previous client : gm20b_504)
[500515.126917] id 13 (gm20b_503) min 576264 max 576264 refs 1 (previous client : gm20b_503)
[500515.135127] id 26 (vblank0) min 30024005 max -2 refs 1 (previous client : )

[500515.143947] ---- channels ----
[500515.147127] 
                channel 0 - 54340000.vic

[500515.153958] 0-54340000.vic (0): 
[500515.157132] active class 01, offset 0000, val 20000000
[500515.162378] DMAPUT 00000bb8, DMAGET 00000bb8, DMACTL 00000000
[500515.168236] CBREAD 20000000, CBSTAT 00010000
[500515.172638] The CDMA sync queue is empty.

[500515.178348] 
                channel 1 - 544c0000.nvenc

[500515.185343] 1-544c0000.nvenc (0): 
[500515.188683] inactive

[500515.192549] 
                ---- host general irq ----

[500515.199545] sync_hintmask_ext = 0xc0000000
[500515.203756] sync_hintmask = 0x80000000
[500515.207617] sync_intc0mask = 0x00000001
[500515.211564] sync_intmask = 0x00000011
[500515.215338] 
                ---- host syncpt irq mask ----

[500515.222683] syncpt_thresh_int_mask(0) = 0x00050001
[500515.227600] syncpt_thresh_int_mask(1) = 0x00000000
[500515.233072] syncpt_thresh_int_mask(2) = 0x00000000
[500515.238095] syncpt_thresh_int_mask(3) = 0x00000000
[500515.243053] syncpt_thresh_int_mask(4) = 0x00000000
[500515.248023] syncpt_thresh_int_mask(5) = 0x00000000
[500515.252982] syncpt_thresh_int_mask(6) = 0x00000000
[500515.257909] syncpt_thresh_int_mask(7) = 0x00000000
[500515.262856] syncpt_thresh_int_mask(8) = 0x00000000
[500515.267851] syncpt_thresh_int_mask(9) = 0x00000000
[500515.272788] syncpt_thresh_int_mask(10) = 0x00000000
[500515.277799] syncpt_thresh_int_mask(11) = 0x00000000
[500515.282826] 
                ---- host syncpt irq status ----

[500515.290369] syncpt_thresh_cpu0_int_status(0) = 0x00000000
[500515.295909] syncpt_thresh_cpu0_int_status(1) = 0x00000000
[500515.301446] syncpt_thresh_cpu0_int_status(2) = 0x00000000
[500515.306987] syncpt_thresh_cpu0_int_status(3) = 0x00000000
[500515.312516] syncpt_thresh_cpu0_int_status(4) = 0x00000000
[500515.318071] syncpt_thresh_cpu0_int_status(5) = 0x00000000
[500515.323601] 
                ---- host syncpt thresh ----

[500515.330818] syncpt_int_thresh_thresh_0(0) = 1
[500515.335732] syncpt_int_thresh_thresh_0(8) = 263546
[500515.340684] syncpt_int_thresh_thresh_0(9) = 284662
[500515.345738] gm20b pbdma 0: 
[500515.348505] id: 4 (tsg), next_id: 4 (tsg) chan status: invalid
[500515.354514] PBDMA_PUT: 0000001f0004a308 PBDMA_GET: 0000001f0004a308 GP_PUT: 00000d96 GP_GET: 00000d96 FETCH: 00000d96 HEADER: 60400000
                HDR: 00000000 SHADOW0: 0004a2f0 SHADOW1: 0000181f

[500515.372571] gm20b eng 0: 
[500515.375152] id: 4 (tsg), next_id: 4 (tsg), ctx status: save 
[500515.380949] busy 

[500515.383060] gm20b eng 1: 
[500515.385622] id: 5 (tsg), next_id: 5 (tsg), ctx status: valid 


[500515.395281] 503-gm20b, pid 31476, refs 2: 
[500515.399352] channel status:  in use idle not busy
[500515.404202] RAMFC : TOP: 8000001f00280078 PUT: 0000001f00280078 GET: 0000001f00280078 FETCH: 0000001f00280078
                HEADER: 60400000 COUNT: 80000000
                SYNCPOINT 00000000 00000d01 SEMAPHORE 00000000 00000000 00000000 00000000

[500515.428064] 504-gm20b, pid 31476, refs 2: 
[500515.432339] channel status:  in use idle not busy
[500515.437168] RAMFC : TOP: 8000001f00240018 PUT: 0000001f00240018 GET: 0000001f00240018 FETCH: 0000001f00240018
                HEADER: 60400000 COUNT: 80000000
                SYNCPOINT 00000000 00000c01 SEMAPHORE 00000000 00000000 00000000 00000000

[500515.461002] 505-gm20b, pid 31476, refs 2: 
[500515.465025] channel status:  in use idle not busy
[500515.469839] RAMFC : TOP: 8000001f00200018 PUT: 0000001f00200018 GET: 0000001f00200018 FETCH: 0000001f00200018
                HEADER: 60400000 COUNT: 80000000
                SYNCPOINT 00000000 00000b01 SEMAPHORE 00000000 00000000 00000000 00000000

[500515.493811] 506-gm20b, pid 31476, refs 4: 
[500515.497842] channel status:  in use pending busy
[500515.502582] RAMFC : TOP: 8000001f00140018 PUT: 0000001f00140018 GET: 0000001f00140018 FETCH: 0000001f00140018
                HEADER: 60400000 COUNT: 80000000
                SYNCPOINT 00000000 00000901 SEMAPHORE 00000000 00000000 00000000 00000000

[500515.526437] 507-gm20b, pid 31476, refs 8: 
[500515.530645] channel status:  in use pending busy
[500515.536146] RAMFC : TOP: 8000001f0004a308 PUT: 0000001f0004a308 GET: 0000001f0004a308 FETCH: 0000001f0004a308
                HEADER: 60400000 COUNT: 80000000
                SYNCPOINT 00000000 00000801 SEMAPHORE 00000000 00000000 00000000 00000000

[500515.560156] 508-gm20b, pid 4158, refs 2: 
[500515.564123] channel status:  in use idle not busy
[500515.568962] RAMFC : TOP: 0000000000000000 PUT: 0000000000000000 GET: 0000000000000000 FETCH: 0000000000000000
                HEADER: 60400000 COUNT: 00000000
                SYNCPOINT 00000000 00000000 SEMAPHORE 00000000 00000000 00000000 00000000

[500515.592813] 509-gm20b, pid 4158, refs 2: 
[500515.596858] channel status:  in use idle not busy
[500515.601705] RAMFC : TOP: 0000000000000000 PUT: 0000000000000000 GET: 0000000000000000 FETCH: 0000000000000000
                HEADER: 60400000 COUNT: 00000000
                SYNCPOINT 00000000 00000000 SEMAPHORE 00000000 00000000 00000000 00000000

[500515.625558] 510-gm20b, pid 4158, refs 2: 
[500515.629510] channel status:  in use idle not busy
[500515.634474] RAMFC : TOP: 0000000000000000 PUT: 0000000000000000 GET: 0000000000000000 FETCH: 0000000000000000
                HEADER: 60400000 COUNT: 00000000
                SYNCPOINT 00000000 00000000 SEMAPHORE 00000000 00000000 00000000 00000000

[500515.658345] 511-gm20b, pid 4158, refs 2: 
[500515.662292] channel status:  in use idle not busy
[500515.667119] RAMFC : TOP: 0000000000000000 PUT: 0000000000000000 GET: 0000000000000000 FETCH: 0000000000000000
                HEADER: 60400000 COUNT: 00000000
                SYNCPOINT 00000000 00000000 SEMAPHORE 00000000 00000000 00000000 00000000

[500515.691213] nvgpu: 57000000.gpu gk20a_fifo_handle_mmu_fault_locked:1721 [ERR]  fake mmu fault on engine 0, engine subid 1 (hub), client 11 (mspdec), addr 0x6e6b147000, type 2 (pte), access_type 0x00000000,inst_ptr 0xac8f4000
[500515.711201] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:129  [ERR]  gr_fecs_os_r : 0
[500515.720000] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:131  [ERR]  gr_fecs_cpuctl_r : 0x40
[500515.729320] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:133  [ERR]  gr_fecs_idlestate_r : 0x1
[500515.738828] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:135  [ERR]  gr_fecs_mailbox0_r : 0x1
[500515.748311] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:137  [ERR]  gr_fecs_mailbox1_r : 0x0
[500515.757745] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:139  [ERR]  gr_fecs_irqstat_r : 0x0
[500515.767169] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:141  [ERR]  gr_fecs_irqmode_r : 0x4
[500515.776516] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:143  [ERR]  gr_fecs_irqmask_r : 0x8704
[500515.786090] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:145  [ERR]  gr_fecs_irqdest_r : 0x0
[500515.795532] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:147  [ERR]  gr_fecs_debug1_r : 0x40
[500515.805000] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:149  [ERR]  gr_fecs_debuginfo_r : 0x0
[500515.814500] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:151  [ERR]  gr_fecs_ctxsw_status_1_r : 0xb04
[500515.824598] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:155  [ERR]  gr_fecs_ctxsw_mailbox_r(0) : 0x4
[500515.834703] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:155  [ERR]  gr_fecs_ctxsw_mailbox_r(1) : 0x0
[500515.844798] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:155  [ERR]  gr_fecs_ctxsw_mailbox_r(2) : 0x50009
[500515.855246] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:155  [ERR]  gr_fecs_ctxsw_mailbox_r(3) : 0x4000
[500515.865606] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:155  [ERR]  gr_fecs_ctxsw_mailbox_r(4) : 0x1ffda0
[500515.876137] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:155  [ERR]  gr_fecs_ctxsw_mailbox_r(5) : 0x0
[500515.886230] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:155  [ERR]  gr_fecs_ctxsw_mailbox_r(6) : 0x0
[500515.896328] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:155  [ERR]  gr_fecs_ctxsw_mailbox_r(7) : 0x0
[500515.906426] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:155  [ERR]  gr_fecs_ctxsw_mailbox_r(8) : 0x0
[500515.916530] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:155  [ERR]  gr_fecs_ctxsw_mailbox_r(9) : 0x0
[500515.926622] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:155  [ERR]  gr_fecs_ctxsw_mailbox_r(10) : 0x0
[500515.936823] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:155  [ERR]  gr_fecs_ctxsw_mailbox_r(11) : 0x3
[500515.947016] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:155  [ERR]  gr_fecs_ctxsw_mailbox_r(12) : 0x0
[500515.957204] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:155  [ERR]  gr_fecs_ctxsw_mailbox_r(13) : 0x0
[500515.967381] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:155  [ERR]  gr_fecs_ctxsw_mailbox_r(14) : 0x0
[500515.977568] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:155  [ERR]  gr_fecs_ctxsw_mailbox_r(15) : 0x0
[500515.987752] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:159  [ERR]  gr_fecs_engctl_r : 0x0
[500515.996983] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:161  [ERR]  gr_fecs_curctx_r : 0x0
[500516.006216] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:163  [ERR]  gr_fecs_nxtctx_r : 0x0
[500516.015528] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:169  [ERR]  FECS_FALCON_REG_IMB : 0xbadfbadf
[500516.025635] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:175  [ERR]  FECS_FALCON_REG_DMB : 0xbadfbadf
[500516.035738] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:181  [ERR]  FECS_FALCON_REG_CSW : 0xbadfbadf
[500516.045832] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:187  [ERR]  FECS_FALCON_REG_CTX : 0xbadfbadf
[500516.055932] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:193  [ERR]  FECS_FALCON_REG_EXCI : 0xbadfbadf
[500516.066112] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:200  [ERR]  FECS_FALCON_REG_PC : 0xbadfbadf
[500516.076124] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:206  [ERR]  FECS_FALCON_REG_SP : 0xbadfbadf
[500516.086135] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:200  [ERR]  FECS_FALCON_REG_PC : 0xbadfbadf
[500516.096140] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:206  [ERR]  FECS_FALCON_REG_SP : 0xbadfbadf
[500516.106149] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:200  [ERR]  FECS_FALCON_REG_PC : 0xbadfbadf
[500516.116158] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:206  [ERR]  FECS_FALCON_REG_SP : 0xbadfbadf
[500516.126169] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:200  [ERR]  FECS_FALCON_REG_PC : 0xbadfbadf
[500516.136176] nvgpu: 57000000.gpu      gk20a_fecs_dump_falcon_stats:206  [ERR]  FECS_FALCON_REG_SP : 0xbadfbadf
[500516.146181] nvgpu: 57000000.gpu gk20a_fifo_handle_mmu_fault_locked:1726 [ERR]  gr_status_r : 0x81
[500516.156404] nvgpu: 57000000.gpu                    fifo_error_isr:2605 [ERR]  channel reset initiated from fifo_error_isr; intr=0x00000100
[500532.880044]

Will you guys please look into the issue? You found any problems in the frequency scaling governour?

Could you confirm if using jetson clocks can also resolve the error too?

The device is running for 16 hours and I ran jetson_clocks at boot. As we see in previous logs the error occurred after 51 hours (183917 seconds). So I can confirm on Monday.

Are there any side effects for temperature? At the moment I don’t see an issue but the environment temperature is only 11 deg Celsius, in the summer the environment temperature can be alot higher.

You guys looked into the DVFS governor already?

Hi,

are you using a fan to cool down the Nano?
While I was running the tests I was logging the temperatures with a modified version of this script in python: GitHub - tsutof/jetson-thermal-monitor: Real-time plot of temperatures from NVIDIA Jetson on-module thermal sensors
Maybe that might be helpful to undestand whether the temperature is reaching high levels.

However, I have never encountered your errors.