We use Xavier NX + R35.2.1 and have got the Kernel Oops like below. This kernel panic happens occasionally.
We applied the patch from this thread but no good luck. The problem sits still there.
Also this thread talks about the similar problem, but in our case ‘Address accessed’ is 0x15b4045c which is different from their address.
Here is the log from ours.
Please advise us.
[12:51:15:314] dary CPU 1 initializing
[12:51:15:332] I/TC: Secondary CPU 1 switching to normal world boot
[12:51:15:349] I/TC: Secondary CPU 2 initializing
[12:51:15:369] I/TC: Secondary CPU 2 switching to normal world boot
[12:51:15:379] I/TC: Secondary CPU 3 initializing
[12:51:15:395] I/TC: Secondary CPU 3 switching to normal world boot
[12:51:15:413] I/TC: Secondary CPU 4 initializing
[12:51:15:426] I/TC: Secondary CPU 4 switching to normal world boot
[12:51:15:445] I/TC: Secondary CPU 5 initializing
[12:51:15:457] I/TC: Secondary CPU 5 switching to normal world boot
[12:51:16:260] ▒▒[ 0.926850] tegra186-dpaux-pinctrl 155f0000.dpaux: can not get clock
[12:51:16:701] ▒▒rm_rail_debugfs_init: /rm/vdd_cpu: failed
[12:51:16:701] rm_rail_debugfs_init: /rm/vdd_cpu: failed
[12:51:16:702] debugfs initialized
[12:51:17:268] ▒▒[ 20.086360] Camera-FW on t194-rce-safe started
[12:51:17:268] TCU early console enabled.
[12:51:17:339] [ 20.154826] Camera-FW on t194-rce-safe ready SHA1=01b72e3c (crt 0.775 ms, total boot 69.271 ms)
[12:51:19:954] ▒▒[ 4.642109] tegradc 15200000.display: hdmi: can’t get adpater for ddc bus 3
[12:51:20:388] [ 4.745551] CPU:0, Error: cbb-noc@2300000, irq=15
[12:51:20:391] [ 4.745694] **************************************
[12:51:20:393] [ 4.745825] CPU:0, Error:cbb-noc
[12:51:20:408] [ 4.745937] Error Logger : 0
[12:51:20:421] [ 4.746024] ErrLog0 : 0x80030000
[12:51:20:451] [ 4.746129] Transaction Type : RD - Read, Incrementing
[12:51:20:455] [ 4.746277] Error Code : SLV
[12:51:20:461] [ 4.746366] Error Source : Target
[12:51:20:480] [ 4.746458] Error Description : Target error detected by CBB slave
[12:51:20:492] [ 4.746623] AXI2APB_5 bridge error: RDFIFOF - Read Response FIFO Full interrupt
[12:51:20:496] [ 4.746806] Packet header Lock : 0
[12:51:20:498] [ 4.746897] Packet header Len1 : 3
[12:51:20:502] [ 4.746990] NOC protocol version : version >= 2.7
[12:51:20:513] [ 4.747111] ErrLog1 : 0x35242e
[12:51:20:518] [ 4.747196] ErrLog2 : 0x0
[12:51:20:522] [ 4.747270] RouteId : 0x35242e
[12:51:20:524] [ 4.747356] InitFlow : ccroc_p2ps/I/ccroc_p2ps
[12:51:20:527] [ 4.747493] Targflow : host1x_p2pm/T/host1x_p2pm
[12:51:20:530] [ 4.747641] TargSubRange : 18
[12:51:20:539] [ 4.747727] SeqId : 0
[12:51:20:541] [ 4.747797] ErrLog3 : 0x4045c
[12:51:20:542] [ 4.747895] ErrLog4 : 0x0
[12:51:20:542] [ 4.748033] Address accessed : 0x15b4045c
[12:51:20:543] [ 4.748638] ErrLog5 : 0xb89f851
[12:51:20:544] [ 4.751809] Non-Modify : 0x1
[12:51:20:546] [ 4.755463] AXI ID : 0x17
[12:51:20:549] [ 4.758351] Master ID : CCPLEX
[12:51:20:551] [ 4.761936] Security Group(GRPSEC): 0x7e
[12:51:20:553] [ 4.765881] Cache : 0x1 – Bufferable
[12:51:20:559] [ 4.770084] Protection : 0x2 – Unprivileged, Non-Secure, Data Access
[12:51:20:568] [ 4.776907] FALCONSEC : 0x0
[12:51:20:570] [ 4.780048] Virtual Queuing Channel(VQC): 0x0
[12:51:20:573] [ 4.784947] **************************************
[12:51:20:575] [ 4.789720] kernel BUG at drivers/soc/tegra/cbb/tegra194-cbb.c:2057!
[12:51:20:577] [ 4.796064] Internal error: Oops - BUG: 0 [#1 ] PREEMPT SMP
[12:51:20:578] [ 4.801745] Modules linked in:
[12:51:20:579] [ 4.804727] CPU: 0 PID: 7 Comm: kworker/u12:0 Not tainted 5.10.104-tegra #2
[ 4.811376] Hardware name: Unknown NVIDIA Jetson Xavier NX Developer Kit/NVIDIA Jetson Xavier NX Developer Ki[12:51:20:584] t, BIOS 202210.2-26fc186-dirty 08/18/2023
[12:51:20:586] [ 4.824770] Workqueue: events_unbound async_run_entry_fn
[12:51:20:589] [ 4.830019] pstate: 60400089 (nZCv daIf +PAN -UAO -TCO BTYPE=–)
[12:51:20:591] [ 4.836314] pc : tegra194_cbb_err_isr+0x19c/0x1b0
[12:51:20:592] [ 4.840775] lr : tegra194_cbb_err_isr+0x11c/0x1b0
[12:51:20:601] [ 4.845756] sp : ffff800010003b40
[12:51:20:604] [ 4.849182] x29: ffff800010003b40 x28: 0000000000000001
[12:51:20:605] [ 4.854599] x27: 0000000000000080 x26: ffffac70a71f85b0
[12:51:20:607] [ 4.859766] x25: ffffac70a7b5be10 x24: 0000000000000001
[12:51:20:608] [ 4.865361] x23: ffffac70a74e7000 x22: ffffac70a797ea00
[12:51:20:611] [ 4.870529] x21: 000000000000000f x20: 0000000000000005
[12:51:20:614] [ 4.876041] x19: ffffac70a797e9f0 x18: 0000000000000010
[12:51:20:621] [ 4.881467] x17: ffffac70a72a0f88 x16: 00000000fc351cbd
[12:51:20:624] [ 4.887236] x15: ffff0014c01630f0 x14: 0720072007200720
[12:51:20:634] [ 4.892422] x13: 0720072007200720 x12: 0720072007200720
[12:51:20:635] [ 4.898002] x11: 0720072007200720 x10: 0720072007200720
[12:51:20:636] [ 4.903259] x9 : 0720072007200720 x8 : 07200720072a072a
[12:51:20:639] [ 4.908764] x7 : 072a072a072a072a x6 : c0000000ffffefff
[12:51:20:642] [ 4.914277] x5 : 0000000000057fa8 x4 : ffffac70a7807968
[12:51:20:645] [ 4.919700] x3 : 00000000ffffffff x2 : ffffac70a5c8e170
[12:51:20:646] [ 4.925295] x1 : ffff0014c0162b80 x0 : 0000000100010100
[12:51:20:647] [ 4.930374] Call trace:
[12:51:20:649] [ 4.933086] tegra194_cbb_err_isr+0x19c/0x1b0
[12:51:20:652] [ 4.937374] __handle_irq_event_percpu+0x68/0x2a0
[12:51:20:669] [ 4.941840] handle_irq_event_percpu+0x40/0xa0
[12:51:20:670] [ 4.946126] handle_irq_event+0x50/0xf0
[12:51:20:671] [ 4.950149] handle_fasteoi_irq+0xc0/0x170
[12:51:20:672] [ 4.953917] generic_handle_irq+0x40/0x60
[12:51:20:673] [ 4.958198] __handle_domain_irq+0x70/0xd0
[12:51:20:674] [ 4.962398] efi_header_end+0xb0/0xf0
[12:51:20:675] [ 4.965897] el1_irq+0xd0/0x180
[12:51:20:676] [ 4.968614] __do_softirq+0xb4/0x3e8
[12:51:20:677] [ 4.972548] irq_exit+0xc0/0xe0
[12:51:20:679] [ 4.975281] __handle_domain_irq+0x74/0xd0
[12:51:20:680] [ 4.979545] efi_header_end+0xb0/0xf0
[12:51:20:681] [ 4.982963] el1_irq+0xd0/0x180
[12:51:20:682] [ 4.986199] tegra_hda_get_dev_id+0x6c/0x2b0
[12:51:20:684] [ 4.990246] tegra_hda_init+0x234/0x440
[12:51:20:686] [ 4.994423] tegra_dc_hdmi_init+0x820/0xb50
[12:51:20:686] [ 4.998015] tegra_dc_set_out+0x2cc/0x480
[12:51:20:695] [ 5.002126] tegra_dc_probe+0xa90/0x1570
[12:51:20:696] [ 5.006322] platform_drv_probe+0x5c/0xb0
[12:51:20:698] [ 5.010266] really_probe+0xf8/0x3d0
[12:51:20:699] [ 5.013940] driver_probe_device+0x60/0xc0
[12:51:20:700] [ 5.018309] __device_attach_driver+0x8c/0xd0
[12:51:20:701] [ 5.022427] bus_for_each_drv+0x8c/0xe0
[12:51:20:702] [ 5.026447] __device_attach_async_helper+0xc4/0xf0
[12:51:20:703] [ 5.031353] async_run_entry_fn+0x4c/0x150
[12:51:20:704] [ 5.035549] process_one_work+0x1c4/0x4a0
[12:51:20:705] [ 5.039660] worker_thread+0x54/0x430
[12:51:20:707] [ 5.043333] kthread+0x148/0x170
[12:51:20:708] [ 5.046420] ret_from_fork+0x10/0x24
[12:51:20:709] [ 5.050342] Code: a9446bf9 a94573fb a8c67bfd d65f03c0 (d4210000)
[12:51:20:711] [ 5.056466] —[ end trace 5e759a9d1ec26c9e ]—
[12:51:20:717] [ 5.061112] Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt
[12:51:20:718] [ 5.068533] SMP: stopping secondary CPUs
[12:51:20:720] [ 5.072495] Kernel Offset: 0x2c7095ad0000 from 0xffff800010000000
[12:51:20:734] [ 5.078337] PHYS_OFFSET: 0xffffffec40000000
[12:51:20:735] [ 5.082707] CPU features: 0x8240002,03802a30
[12:51:20:737] [ 5.087080] Memory Limit: none
[12:51:20:739] [ 5.090148] —[ end Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt ]—
please set the tegra_dc driver from module_init to late_initcall in kernel and see if that bypass your error.
Thanks for this. We will try it and let you know.
We have set the tegra_dc driver from module_init to late_initcall, but the problem still sits there.
The code we have modified is in /nvidia/drivers/video/tegra/dc/dc.c
============================================================
?/TC: Secondary CPU 1 initializing
I/TC: Secondary CPU 1 switching to normal world boot
I/TC: Secondary CPU 2 initializing
I/TC: Secondary CPU 2 switching to normal world boot
I/TC: Secondary CPU 3 initializing
I/TC: Secondary CPU 3 switching to normal world boot
I/TC: Secondary CPU 4 initializing
I/TC: Secondary CPU 4 switching to normal world boot
I/TC: Secondary CPU 5 initializing
I/TC: Secondary CPU 5 switching to normal world boot
? 0.942803] tegra186-dpaux-pinctrl 155f0000.dpaux: can not get clock
?m_rail_debugfs_init: /rm/vdd_cpu: failed
rm_rail_debugfs_init: /rm/vdd_cpu: failed
debugfs initialized
? 20.112132] Camera-FW on t194-rce-safe started
TCU early console enabled.
[ 20.180615] Camera-FW on t194-rce-safe ready SHA1=01b72e3c (crt 0.775 ms, total boot 69.288 ms)
? 4.738501] tegradc 15200000.display: hdmi: can’t get adpater for ddc bus 3
[ 4.844383] CPU:0, Error: cbb-noc@2300000, irq=15
[ 4.844534] **************************************
[ 4.844683] CPU:0, Error:cbb-noc
[ 4.844768] Error Logger : 0
[ 4.844872] ErrLog0 : 0x80030000
[ 4.844963] Transaction Type : RD - Read, Incrementing
[ 4.845104] Error Code : SLV
[ 4.845189] Error Source : Target
[ 4.845279] Error Description : Target error detected by CBB slave
[ 4.845457] AXI2APB_5 bridge error: RDFIFOF - Read Response FIFO Full interrupt
[ 4.845645] Packet header Lock : 0
[ 4.845734] Packet header Len1 : 3
[ 4.845825] NOC protocol version : version >= 2.7
[ 4.845944] ErrLog1 : 0x352424
[ 4.846027] ErrLog2 : 0x0
[ 4.846099] RouteId : 0x352424
[ 4.846199] InitFlow : ccroc_p2ps/I/ccroc_p2ps
[ 4.846328] Targflow : host1x_p2pm/T/host1x_p2pm
[ 4.846454] TargSubRange : 18
[ 4.846551] SeqId : 0
[ 4.846623] ErrLog3 : 0x4045c
[ 4.846707] ErrLog4 : 0x0
[ 4.846850] Address accessed : 0x15b4045c
[ 4.847467] ErrLog5 : 0x909f851
[ 4.850622] Non-Modify : 0x1
[ 4.854291] AXI ID : 0x12
[ 4.857439] Master ID : CCPLEX
[ 4.857451] Security Group(GRPSEC): 0x7e
[ 4.864713] Cache : 0x1 – Bufferable
[ 4.864718] Protection : 0x2 – Unprivileged, Non-Secure, Data Access
[ 4.864722] FALCONSEC : 0x0
[ 4.864725] Virtual Queuing Channel(VQC): 0x0
[ 4.864729] **************************************
[ 4.864766] ------------[ cut here ]------------
[ 4.868934] nvethernet 2490000.ethernet: failed to get eqos_tx_divider clk
[ 4.875735] kernel BUG at drivers/soc/tegra/cbb/tegra194-cbb.c:2057!
[ 4.875745] Internal error: Oops - BUG: 0 [#1 ] PREEMPT SMP
[ 4.875761] Modules linked in:
[ 4.915028] CPU: 0 PID: 103 Comm: kworker/u12:2 Not tainted 5.10.104-tegra #2
[ 4.922189] Hardware name: Unknown NVIDIA Jetson Xavier NX Developer Kit/NVIDIA Jetson Xavier NX Develo per Kit, BIOS 202210.2-26fc186-dirty 08/18/2023
[ 4.935687] Workqueue: events_unbound async_run_entry_fn
[ 4.940932] pstate: 60400089 (nZCv daIf +PAN -UAO -TCO BTYPE=–)
[ 4.947222] pc : tegra194_cbb_err_isr+0x19c/0x1b0
[ 4.951872] lr : tegra194_cbb_err_isr+0x11c/0x1b0
[ 4.956489] sp : ffff800010003b40
[ 4.959563] x29: ffff800010003b40 x28: 0000000000000001
[ 4.965335] x27: 0000000000000080 x26: ffffb4f72aae85b0
[ 4.970665] x25: ffffb4f72b44be10 x24: 0000000000000001
[ 4.975745] x23: ffffb4f72add7000 x22: ffffb4f72b26ea00
[ 4.981427] x21: 000000000000000f x20: 0000000000000005
[ 4.986686] x19: ffffb4f72b26e9f0 x18: 0000000000000060
[ 4.992278] x17: ffffb4f72ab90f88 x16: 0000000000000068
[ 4.997533] x15: ffff3bd780fb6af0 x14: ffffffffffffffff
[ 5.003216] x13: ffffb4f72b3f9de8 x12: ffffb4f72b3f9a2d
[ 5.008295] x11: 0720072007200720 x10: 0720072007200720
[ 5.014067] x9 : ffff800010003a50 x8 : 2a2a2a2a2a2a2a2a
[ 5.019578] x7 : 2a2a2a2a2a2a2a2a x6 : 00000000219e00fa
[ 5.025090] x5 : 000000000000000c x4 : 00000000fffff294
[ 5.030514] x3 : 00000000ffffffff x2 : ffffb4f72957e170
[ 5.035596] x1 : ffff3bd780fb6580 x0 : 0000000000010100
[ 5.041190] Call trace:
[ 5.043645] tegra194_cbb_err_isr+0x19c/0x1b0
[ 5.047675] __handle_irq_event_percpu+0x68/0x2a0
[ 5.052398] handle_irq_event_percpu+0x40/0xa0
[ 5.056959] handle_irq_event+0x50/0xf0
[ 5.060449] handle_fasteoi_irq+0xc0/0x170
[ 5.064731] generic_handle_irq+0x40/0x60
[ 5.068755] __handle_domain_irq+0x70/0xd0
[ 5.072957] efi_header_end+0xb0/0xf0
[ 5.076196] el1_irq+0xd0/0x180
[ 5.079171] __do_softirq+0xb4/0x3e8
[ 5.082852] irq_exit+0xc0/0xe0
[ 5.086080] __handle_domain_irq+0x74/0xd0
[ 5.090107] efi_header_end+0xb0/0xf0
[ 5.093780] el1_irq+0xd0/0x180
[ 5.096500] tegra_hda_get_dev_id+0x6c/0x2b0
[ 5.101043] tegra_hda_init+0x234/0x440
[ 5.104981] tegra_dc_hdmi_init+0x820/0xb50
[ 5.108832] tegra_dc_set_out+0x2cc/0x480
[ 5.112944] tegra_dc_probe+0xa90/0x1570
[ 5.116624] platform_drv_probe+0x5c/0xb0
[ 5.120840] really_probe+0xf8/0x3d0
[ 5.124498] driver_probe_device+0x60/0xc0
[ 5.128611] __device_attach_driver+0x8c/0xd0
[ 5.133242] bus_for_each_drv+0x8c/0xe0
[ 5.136765] __device_attach_async_helper+0xc4/0xf0
[ 5.141651] async_run_entry_fn+0x4c/0x150
[ 5.145854] process_one_work+0x1c4/0x4a0
[ 5.149960] worker_thread+0x54/0x430
[ 5.153637] kthread+0x148/0x170
[ 5.157217] ret_from_fork+0x10/0x24
[ 5.160902] Code: a9446bf9 a94573fb a8c67bfd d65f03c0 (d4210000)
[ 5.167025] —[ end trace aeffe061f49abe1f ]—
[ 5.171653] Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt
[ 5.179097] SMP: stopping secondary CPUs
[ 5.182796] Kernel Offset: 0x34f7193c0000 from 0xffff800010000000
[ 5.188897] PHYS_OFFSET: 0xffffc42980000000
[ 5.193272] CPU features: 0x8240002,03802a30
[ 5.197385] Memory Limit: none
[ 5.200451] —[ end Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt ]—
How frequently are you able to reproduce this issue?
Also, is that on custom board or NV devkit?
Do you have boot logo enabled on your board or not?
Could you try to share the full uart log + dmesg instead of just partial kernel panic log?
Also, share me a normal boot up log too.
I’ve attached the normal and panic log. Thanks for your help.
log_kernel_panic.txt (18.7 KB)
log_normal.txt (17.7 KB)
FYI, this kernel panic log has been captured before we applied late_initcall.
Did you ever read the log you shared and did you ever see the full dmesg in the log?
sw0128
October 25, 2023, 3:06am
11
Hi. I have attached a kernel panic log after I applied late_initcall
late_initcall is applied to dc.c in sources/kernel/kernel/nvidia/drivers/video/tegra/dc
kernel_panci_after_late_initcall.txt (6.5 KB)
Hi,
I am not sure if you know what dmesg means. I need to see the full boot up log in kernel.
But none of your log ever showed that part. You have bootloader log. But you didn’t have kernel logs.
sw0128
October 25, 2023, 4:57am
13
Hi
here is the log on normal boot up with late_initcall
but, you know I didn’t get log through dmesg when we meet kernel panic. because i didn’t meet prompt from file system.
normal_bootup_log.txt (71.7 KB)
That is because you have “quiet” in your kernel cmdline. You have to remove it so that your uart log will have dmesg.
[ 0.000000] Kernel command line: quiet root=/dev/mmcblk0p1 rw rootwait rootfstype=ext4 console=ttyTCU0,115200n8 console=tty1 fbcon=map:1 net.ifnames=0
I need you to share the full log which contains bootloader log and kernel log all at once for both successful case and NG case.
So far there is no such log provided. You only provided uefi log but no kernel log or only kernel log but no bootloader log. I need them exist in same log.
sw0128
October 25, 2023, 6:09am
17
Hi
sorry to give you log with quiet flag.
I’ve attached again log after removed quiet flag
late_initcall_normal_log.txt (76.7 KB)
late_initcall_panic_log.txt (62.9 KB)
Hi,
In previous comment, you told me you already disabled the boot logo.
But this log obviously tells the reversed result. Which one is correct here?
[0003.823] I> edid read success
[0003.835] I> edid read success
[0003.835] I> width = 640, height = 480, frequency = 25174825
[0003.836] I> width = 1024, height = 768, frequency = 65000000
[0003.836] I> width = 1920, height = 1080, frequency = 148500000
[0003.837] I> width = 1360, height = 768, frequency = 85500000
[0003.837] I> width = 720, height = 576, frequency = 27000000
[0003.841] I> width = 720, height = 480, frequency = 26973026
[0003.847] I> width = 720, height = 480, frequency = 26973026
[0003.852] I> width = 720, height = 576, frequency = 26973026
[0003.858] I> width = 720, height = 576, frequency = 26973026
[0003.863] I> width = 1280, height = 720, frequency = 74175824
[0003.869] I> width = 1280, height = 720, frequency = 74175824
[0003.874] I> width = 1920, height = 1080, frequency = 148351648
[0003.880] I> width = 1920, height = 1080, frequency = 148351648
[0003.886] I> Best mode Width = 1920, Height = 1080, freq = 148351648
[0003.897] I> hdmi_enable, starting HDMI initialisation
[0003.902] I> hdmi_enable, HDMI initialisation complete
[0003.911] initializing target
sw0128
October 25, 2023, 7:39am
19
Hi.
after test with bootloader status disabled and return false, I’ve recovered them.
I’ll set them ( bootloader false, return false, late_initcall) again and then give log to you.
thank you so much to your support
I just want to clarify if this issue is related to bootloader logo enabled/disabled.
Also, since bootloader status is for bootloader to use, you cannot just flash kernel dtb partition.
You have to full flash the board or flash the bootloader dtb partition to make it work.
sw0128
October 26, 2023, 8:27am
21
if set bootloader-status = “disabled” in kernel dts is correct, there is no message like “I> hdmi_enable, starting HDMI initialisation” in UEFI?
How could I know if set bootloader-status = “disabled” is correct or not.