L4T 35.4.1 Kernel Panic when using DP on custom carrier

Hello,

we designed a Carrier Board which should be used with Xavier NX/ Orin NX and Orin Nano.

For Xavier NX everything is working properly.
I’ve adapted the device tree for the Orin Nano now. If I plug in the display port cable after the system has booted, the display works correctly.

If I plug the display port cable before boot the system runs into a kernel panic.
Any ideas what could cause this issue? Is there anything I have to adapt in the Pinmux? I am using the dp-a03 configuration.

[   13.172671] systemd[1]: Started Journal Service.
[   13.403313] urandom_read_iter: 49 callbacks suppressed
[   13.403316] random: systemd: uninitialized urandom read (16 bytes read)
[   13.403682] random: systemd-journal: uninitialized urandom read (16 bytes read)
[   13.423462] random: systemd-journal: uninitialized urandom read (16 bytes read)
[   13.474664] systemd-journald[243]: Received client request to flush runtime journal.
[   13.758631] random: crng init done
[   13.762169] random: 20 urandom warning(s) missed due to ratelimiting
[   13.856023] nvidia: loading out-of-tree module taints kernel.
[   13.866867] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[   13.972227] at24 0-0064: supply vcc not found, using dummy regulator
[   14.015249] at24 0-0065: supply vcc not found, using dummy regulator
[   14.054002] at24 0-0066: supply vcc not found, using dummy regulator
[   14.093848] at24 0-0067: supply vcc not found, using dummy regulator
[   14.135190] at24 0-0068: supply vcc not found, using dummy regulator
[   14.192263] at24 2-0064: supply vcc not found, using dummy regulator
[   14.245555] at24 2-0065: supply vcc not found, using dummy regulator
[   14.283846] at24 2-0066: supply vcc not found, using dummy regulator
[   14.329292] at24 2-0067: supply vcc not found, using dummy regulator
[   14.373686] at24 2-0068: supply vcc not found, using dummy regulator
[   15.980333] CPU:0, Error: dce-fabric@0xde00000, irq=28
[   15.985634] **************************************
[   15.990563] CPU:0, Error:dce-fabric, Errmon:2
[   15.995048]    Multiple type of errors reported
[   15.999708]    Error Code            : FIREWALL_ERR
[   16.003837]    Error Code            : TIMEOUT_ERR
[   16.007873]    Overflow              : Multiple FIREWALL_ERR
[   16.012626]
[   16.014156]    Error Code            : TIMEOUT_ERR
[   16.018189]    MASTER_ID             : DCE
[   16.021420]    Address               : 0x1380c01c
[   16.025099]    Cache                 : 0x1 -- Bufferable
[   16.029400]    Protection            : 0x3 -- Privileged, Non-Secure, Data Access
[   16.036211]    Access_Type           : Read
[   16.039698]    Access_ID             : 0x0
[   16.039699]    Fabric                : dce-fabric
[   16.046502]    Slave_Id              : 0x37
[   16.049727]    Burst_length          : 0x0
[   16.053222]    Burst_type            : 0x1
[   16.056541]    Beat_size             : 0x2
[   16.059766]    VQC                   : 0x0
[   16.062548]    GRPSEC                : 0x3f
[   16.065599]    FALCONSEC             : 0x0
[   16.068837] Unable to handle kernel paging request at virtual address 0000000000081000
[   16.076983] Mem abort info:
[   16.079852]   ESR = 0x96000005
[   16.082987]   EC = 0x25: DABT (current EL), IL = 32 bits
[   16.088445]   SET = 0, FnV = 0
[   16.091581]   EA = 0, S1PTW = 0
[   16.094805] Data abort info:
[   16.097757]   ISV = 0, ISS = 0x00000005
[   16.101697]   CM = 0, WnR = 0
[   16.104743] user pgtable: 4k pages, 48-bit VAs, pgdp=000000010ef8b000
[   16.111372] [0000000000081000] pgd=000000010ee57003, p4d=000000010ee57003, pud=0000000000000000
[   16.120324] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[   16.126048] Modules linked in: lzo_rle lzo_compress zram nvidia_modeset(OE) ramoops reed_solomon binfmt_misc aes_ce_blk crypto_simd cryptd aes_ce_cipher ghash_ce sha2_ce sha256_arm64 sha1_ce userspace_alert tegra_bpmp_thermal tegra210_adma input_leds at24 spi_tegra114 r8168 overlay nvidia(OE) ina3221 pwm_fan nvgpu nvmap
[   16.155217] CPU: 0 PID: 683 Comm: C1 CompilerThre Tainted: G           OE     5.10.120-tegra #115
[   16.164327] Hardware name:  Carrier Board, BIOS 4.1-33958178 08/01/2023
[   16.176732] pstate: 40400089 (nZcv daIf +PAN -UAO -TCO BTYPE=--)
[   16.182900] pc : __pi_strlen+0x10/0x84
[   16.186758] lr : strstr+0x30/0x90
[   16.190162] sp : ffff800010003cc0
[   16.193564] x29: ffff800010003cc0 x28: ffff37d000f34780
[   16.199017] x27: ffffa1e11357f700 x26: 0000000000081000
[   16.204480] x25: 0000000000000370 x24: ffff80002444d858
[   16.209942] x23: ffffa1e11394d4c0 x22: 0000000000000001
[   16.215393] x21: ffffa1e11394d5f0 x20: 0000000000000007
[   16.220847] x19: 0000000000081000 x18: 0000000000000010
[   16.226300] x17: 0000000000000000 x16: ffffa1e1123b5220
[   16.231752] x15: ffff37d00d7ea2f0 x14: ffffffffffffffff
[   16.237207] x13: ffff800090003917 x12: 0000000000000038
[   16.242661] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
[   16.248111] x9 : 535e554b525e7224 x8 : 7f7f7f7f7f7f7f7f
[   16.253570] x7 : 0000000000000000 x6 : 0000000000000080
[   16.259030] x5 : 8000000000000000 x4 : 8080808080000000
[   16.264480] x3 : 545f564c535f7325 x2 : 0042504132495841
[   16.269934] x1 : 0000000000081000 x0 : 0000000000081000
[   16.275387] Call trace:
[   16.277901]  __pi_strlen+0x10/0x84
[   16.281392]  print_err_notifier+0x610/0x734
[   16.285695]  tegra234_cbb_isr+0xd0/0x170
[   16.289727]  __handle_irq_event_percpu+0x68/0x2a0
[   16.294553]  handle_irq_event_percpu+0x40/0xa0
[   16.299114]  handle_irq_event+0x50/0xf0
[   16.303052]  handle_fasteoi_irq+0xc0/0x170
[   16.307259]  generic_handle_irq+0x40/0x60
[   16.311376]  __handle_domain_irq+0x70/0xd0
[   16.315585]  gic_handle_irq+0x68/0x134
[   16.319431]  el0_irq_naked+0x4c/0x54
[   16.323103] Code: b200c3eb 927cec01 f2400c07 54000261 (a8c10c22)
[   16.329375] ---[ end trace 0ab62dfb322fd687 ]---
[   16.339578] Kernel panic - not syncing: Oops: Fatal exception in interrupt
[   16.346644] SMP: stopping secondary CPUs
[   16.350675] Kernel Offset: 0x21e102390000 from 0xffff800010000000
[   16.356927] PHYS_OFFSET: 0xffffc83100000000
[   16.361224] CPU features: 0x08040006,4a80aa38
[   16.365693] Memory Limit: none
[   16.374255] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---
  1. Could you put same module directly to Orin Nano devkit and see if you hit same issue?

  2. Could you reflash same module with sdkm on Orin Nano devkit and see if you see same issue?

I feel the issue is from the customization. Thus, I expect (2) shall not hit issue while (1) would.

Recent report related to this issue is this one.

But the case here is HDMI and the user didn’t reply the thread.

I tried the patch even though it said HDMI, that does not have any effect.

  1. With the module from our custom board and the same SSD the issue is also occuring on the devkit.
  2. Didn’t try with the SDK but with the L4T and my flash scripts, I get the same Issue with the display plugged in on the devkit with the device tree for the devkit. Only the rootfs and very few device specific files are customized there.

I think your assumption is correct.

Which customization might cause that issue? I am using the same rootfs with the Xavier NX devkit and the Orin Nano devkit, on Orin I’ve got the issue, on Xavier everything works.

I’d suspect device tree changes.

Can this patch for the Xavier cause the issue on the Orin?
Every other patch is affecting tegra194 files which should not influence orin.

--- a/nvidia/drivers/video/tegra/dc/dc.c
+++ b/nvidia/drivers/video/tegra/dc/dc.c
@@ -6374,7 +6374,7 @@
 		pr_debug("dc->fb_mem not initialized\n");
 		return false;
 	}
-	return (dc->fb_mem->start != 0);
+	return false;
 }
 EXPORT_SYMBOL(tegra_is_bl_display_initialized);

No, that xaiver patch won’t affect Orin because Orin totally does not use that driver at all.

I would suggest you can put the default kernel image from jetpack rel-35.4.1 to your board first.
Or even flash sdkmanager image with minimal change to make it able to boot from your custom board first. For example, disable cvb eeprom read size, make nvme or usb drive able to work first so that you can boot.

This issue cannot reproduce on devkit so you should try to change things one by one and thing which is causing the problem.
As I said, this issue is something new that I cannot directly reply what is causing problem immediately.

Hey @WayneWWW

my image with the kernel and the modules from the reference image works.

So my assumption would be that I’ve got some issue in the current kernel config. Or are there any other files which affect the kernel Image and the modules?

Any idea which missing module could cause the issue?

Hi,

Could you check if CONFIG_FB_SIMPLE was enabled in default image and got disabled in your image?

That would have been my next test, I noticed that CONFIG_FB_EFI and CONFIG_FB_SIMPLE are not set but are enabled in the defconfig.

Will test that then.
My issue is that I’ve got my modified config for all the stuff we have enabled.
When a new jetpack comes out, I can’t get all the changes applied easily since the defconfig is missing many options.

1 Like

Yup, the CONFIG_FB_SIMPLE was not used in the previous Jetpack, enabling it in 35.4.1 solves the issue. Thanks!

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.