Here I have one set of logs with a small stack trace and some details about the CPU/registers.
I have 9 devices, all the same, that’s the only one that does that. Do I need to replace it?
Thank you.
Alexis
[Sat Jan 23 15:29:44 2021] ------------[ cut here ]------------
[Sat Jan 23 15:29:44 2021] WARNING: CPU: 0 PID: 2548 at /dvs/git/dirty/git-master_linux/kernel/nvgpu/drivers/gpu/nvgpu/common/pmu/pmu_pg.c:275 nvgpu_pmu_disable_elpg+0xf4/0x348 [nvgpu]
[Sat Jan 23 15:29:44 2021] Modules linked in: fuse bnep zram overlay nf_log_ipv6 ip6table_filter ip6_tables xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG xt_multiport spidev xt_conntrack nf_conntrack iptable_filter userspace_alert nvgpu bluedroid_pm ip_tables x_tables
[Sat Jan 23 15:29:44 2021] CPU: 0 PID: 2548 Comm: irq/476-gk20a_s Tainted: G W 4.9.140-tegra #1
[Sat Jan 23 15:29:44 2021] Hardware name: Jetson-AGX (DT)
[Sat Jan 23 15:29:44 2021] task: ffffffc7da096200 task.stack: ffffffc7c155c000
[Sat Jan 23 15:29:44 2021] PC is at nvgpu_pmu_disable_elpg+0xf4/0x348 [nvgpu]
[Sat Jan 23 15:29:44 2021] LR is at nvgpu_pmu_disable_elpg+0xf4/0x348 [nvgpu]
[Sat Jan 23 15:29:44 2021] pc : [<ffffff8000fdcd1c>] lr : [<ffffff8000fdcd1c>] pstate: 20c00045
[Sat Jan 23 15:29:44 2021] sp : ffffffc7c155fbe0
[Sat Jan 23 15:29:44 2021] x29: ffffffc7c155fbf0 x28: 0000000000000000
[Sat Jan 23 15:29:44 2021] x27: 0000000000000001 x26: 0000000000000000
[Sat Jan 23 15:29:44 2021] x25: ffffff800105b470 x24: ffffffc7c28d26a8
[Sat Jan 23 15:29:44 2021] x23: ffffffc7c28d2d28 x22: 0000000000000000
[Sat Jan 23 15:29:44 2021] x21: ffffffc7c28d8000 x20: ffffff8001062b38
[Sat Jan 23 15:29:44 2021] x19: ffffffc7c28d0000 x18: 0000000000000003
[Sat Jan 23 15:29:44 2021] x17: 0000007f8812e258 x16: 00000000001ca875
[Sat Jan 23 15:29:44 2021] x15: ffffffffffffffff x14: 5f756d705f757067
[Sat Jan 23 15:29:44 2021] x13: 766e20205d4e5257 x12: 5b20203437323a67
[Sat Jan 23 15:29:44 2021] x11: 706c655f656c6261 x10: 7369645f756d705f
[Sat Jan 23 15:29:44 2021] x9 : 757067766e202020 x8 : ffffffc7ffc1a6d4
[Sat Jan 23 15:29:44 2021] x7 : 0000000000000000 x6 : 00000000133a5967
[Sat Jan 23 15:29:44 2021] x5 : 0000000000000000 x4 : 0000000000000000
[Sat Jan 23 15:29:44 2021] x3 : ffffffffffffffff x2 : 00000047f642b000
[Sat Jan 23 15:29:44 2021] x1 : ffffffc7da096200 x0 : 000000000000008a
[Sat Jan 23 15:29:44 2021] ---[ end trace a5f50b22b422d710 ]---
[Sat Jan 23 15:25:07 2021] Call trace:
[Sat Jan 23 15:25:07 2021] [<ffffff8000fdcd1c>] nvgpu_pmu_disable_elpg+0xf4/0x348 [nvgpu]
[Sat Jan 23 15:25:07 2021] [<ffffff8000fdd064>] nvgpu_pmu_pg_global_enable+0xf4/0x108 [nvgpu]
[Sat Jan 23 15:25:07 2021] [<ffffff8000f9a4b8>] nvgpu_pg_elpg_disable+0xb0/0xc8 [nvgpu]
[Sat Jan 23 15:25:07 2021] [<ffffff8000f9736c>] mc_gp10b_isr_stall+0xac/0x218 [nvgpu]
[Sat Jan 23 15:25:07 2021] [<ffffff8000fa9728>] nvgpu_intr_thread_stall+0x50/0x1d8 [nvgpu]
[Sat Jan 23 15:25:07 2021] [<ffffff8000fb9940>] nvgpu_fecs_trace_init_debugfs+0x30f8/0x3198 [nvgpu]
[Sat Jan 23 15:25:07 2021] [<ffffff8008123980>] irq_thread_fn+0x30/0x80
[Sat Jan 23 15:25:07 2021] [<ffffff8008123cbc>] irq_thread+0x11c/0x1a8
[Sat Jan 23 15:25:07 2021] [<ffffff80080dbe64>] kthread+0xec/0xf0
[Sat Jan 23 15:25:07 2021] [<ffffff80080838a0>] ret_from_fork+0x10/0x30
[Sat Jan 23 15:25:07 2021] nvgpu: 17000000.gv11b nvgpu_pmu_enable_elpg:208 [WRN] nvgpu_pmu_enable_elpg(): possible elpg refcnt mismatch. elpg refcnt=2