Gk20a_channel_timeout_handler in Jetson TX2

Hi , In one of our projects with Jetson TX2 running with 32.5L4T, we see the below problem after few days of running the cuda based graphics application continuously.

[27005.019288] nvgpu: 17000000.gp10b gk20a_channel_timeout_handler:1573 [ERR] Job on channel 502 timed out
[27005.030424] nvgpu: 17000000.gp10b nvgpu_set_error_notifier_locked:137 [ERR] error notifier set to 8 for ch 502
[27005.147316] nvgpu: 17000000.gp10b gk20a_channel_timeout_handler:1573 [ERR] Job on channel 503 timed out
[27005.158616] nvgpu: 17000000.gp10b nvgpu_set_error_notifier_locked:137 [ERR] error notifier set to 8 for ch 503
[27011.235220] nvgpu: 17000000.gp10b gk20a_channel_timeout_handler:1573 [ERR] Job on channel 505 timed out
[27011.246574] nvgpu: 17000000.gp10b nvgpu_set_error_notifier_locked:137 [ERR] error notifier set to 8 for ch 505
[27014.874996] INFO: rcu_preempt detected stalls on CPUs/tasks:
[27014.880693] 0-…: (4 GPs behind) idle=72b/140000000000002/0 softirq=685700/685711 fqs=2530 [27014.889212] (detected by 4, t=5255 jiffies, g=505571, c=505570, q=1271) [27024.350999] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0 [27024.358234] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.9.201-tegra #100
[27024.364934] Hardware name: quill (DT)
[27024.368599] Call trace:
[27024.371062] dump_backtrace+0x0/0x198
[27024.376468] show_stack+0x24/0x30
[27024.381525] dump_stack+0xa0/0xc8
[27024.386580] panic+0x12c/0x2a8
[27024.391378] watchdog_check_hardlockup_other_cpu+0x11c/0x120 [27024.398773] watchdog_timer_fn+0x98/0x2c0
[27024.404521] __hrtimer_run_queues5y+0xd8/0x360
[27024.410527] hrtimer_interrupt+0xa8/0x1e0
[27024.416277] tegra186_timer_isr+0x34/0x48
[27024.422025] __handle_irq_event_percpu+0x68/0x288
[27024.428463] handle_irq_event_percpu+0x28/0x60 [27024.434640] handle_irq_event+0x50/0x80
[27024.440213] handle_fasteoi_irq+0xd4/0x1c0
[27024.446044] generic_handle_irq+0x34/0x50
[27024.451789] __handle_domain_irq+0x68/0xc0
[27024.457619] gic_handle_irq+0x5c/0xb0
[27024.463016] el1_irq+0xe8/0x194
[27024.467895] cpuidle_enter_state+0xb8/0x380
[27024.473813] cpuidle_enter+0x34/0x48
[27024.479125] call_cpuidle+0x44/0x70
[27024.484348] cpu_startup_entry+0x1b0/0x200
[27024.490182] secondary_start_kernel+0x190/0x1f8
[27024.496445] [<000000008122b1a4>] 0x8122b1a4
[27024.500632] SMP: stopping secondary CPUs
[27025.707803] SMP: failed to stop secondary CPUs 0,5
[27025.712594] Kernel Offset: disabled
[27025.716084] Memory Limit: none
[27025.719142] trusty-log panic notifier - trusty version Built: 14:49:57 Jan 15 2021 [27025.753433] Rebooting in 5 seconds…
[27030.758140] SMP: stopping secondary CPUs
[27031.965311] SMP: failed to stop secondary CPUs 0,5
[0000.175] I> Welcome to MB2(TBoot-BPMP)(version: 01.00.160913-t186-M-00.00-mobile-03715cad) [0000.184] I> Boot-device: eMMC
[0000.191] I> sdmmc bdev is already initialized
[0000.196] I> pmic: reset reason (nverc) : 0x0
[0000.229] I> Found 19 partitions in SDMMC_BOOT (instance 3)
[0000.249] I> Found 34 partitions in SDMMC_USER (instance 3)
[0000.255] I> A/B: bin_type (16) slot 1
[0000.258] I> Loading partition bpmp-fw_b at 0xd7800000
[0000.263] I> Reading two headers - addr:0xd7800000 blocks:1
[0000.269] I> Addr: 0xd7800000, start-block: 44098752, num_blocks: 1
[0000.294] I> Binary(16) of size 534416 is loaded @ 0xd7800000
[0000.299] I> A/B: bin_type (17) slot 1
[0000.303] I> Loading partition bpmp-fw-dtb_b at 0xd79f0000
[0000.308] I> Reading two headers - addr:0xd79f0000 blocks:1
[0000.314] I> Addr: 0xd79f0000, start-block: 44102008, num_blocks: 1
[0000.340] I> Binary(17) of size 604720 is loaded @ 0xd796c400

What could be the reason for this and how to handle this?

Hello,

This looks like it belongs in the Jetson TX2 forum. I have moved it over for you.

please try to upgrade to latest rel-32 L4T first.

Hi @WayneWWW, I am already in 32.5L4T & Kernel version is below,

VERSION = 4
PATCHLEVEL = 9
SUBLEVEL = 201
EXTRAVERSION =

Latest 32 means, I should upgrade it to 32.7.3?
Is there any fixes added in 32.7.3 regarding gk20a?
Is there a way to trace the root cause of this issue in 32.5? This will be really helpful to us.

Thanks in advance.

No, I don’t know if the issue is fixed or not. But we don’t debug on old release. So please upgrade you jetpack version first.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.