Kernel panic when resume

Hi NV,
we hit kernel panic issue when suspend/resume stress,
log looks like GPU or HDMI issue,
colud you help to clarify it?
thanks.

[ 5683.889505] nvidia-modeset: ERROR: GPU:0: Failed detecting connected display devices
[ 5683.889777] nvidia-modeset: ERROR: GPU:0: Failure reading maximum pixel clock value for display device HDMI-0.

panic.log (46.2 KB)

Hello,

Is this issue able to reproduce with NV devkit? For HDMI, need to use p3509 (xavier nx devkit carrier board). For DP, need to use orin nano devkit.

Hi,

I have encountered the same issue

Iā€™ve tried the p3509 and it still showed the dce-fabric error, but no kernel panic

The suspend and resume function is ok

suspend.txt (9.9 KB)

This topic only discuss about the panic case.

Yes, I mean I also encounter the panic issue on our custom board

kernelpanic.txt (5.1 KB)

And I tried to reproduce on xavier nx devkit carrier board, but it canā€™t reproduce

Do you have full log of your custom board log to share? Full log means starting from power up.

Hereā€™s full log
log.txt (92.9 KB)

What does your device tree look like? Is it based on 3768 device tree? What did you change there?

Yes, it is base on p3768 device tree

  1. I have changed usb related because we have some usb3 and use micro usb for flash port instead of type-c

  2. add a new com port for serial@3110000

  3. I also modified the display for hpd
    display@13800000 {

  •           os_gpio_hotplug_a = <&tegra_main_gpio TEGRA234_MAIN_GPIO(M, 0) GPIO_ACTIVE_HIGH>;
              status = "okay";
      };
    
  1. I modified the display source for hdmi in tegra234-p3767-0000-p3768-0000-a0.dts(I know it is for Orin NX 16GB, but Orin NANO will include this file, so I modified here)

+//#include ā€œtegra234-dcb-p3767-0000-dp.dtsiā€
+#include ā€œtegra234-dcb-p3767-0000-hdmi.dtsiā€

and I also modified the pinmux in p3768-0000+p3767-0000.conf from DP to HDMI

What is your method to update the device tree?

modified all dts and build, then copy to the l4t/kernel/dtb, then flash by command in document
https://docs.nvidia.com/jetson/archives/r35.3.1/DeveloperGuide/text/IN/QuickStart.html

Does this issue happen with specific monitor or every monitor can hit it?

Hi

I changed another monitor, and it didnā€™t show kernel panic but stuck at

ļæ½ļæ½[ 202.468270] Camera-FW on t234-rce-safe started
TCU early console enabled.
[ 202.579495] Camera-FW on t234-rce-safe ready SHA1=97e50cbf (crt 1.260 ms, total boot 112.585 ms)

and it reboot a few minutes later

Is it also ā€œresumeā€ case?

No it is suspend case, it seems can not suspend normally

So it can boot up if no monitor is connected?

Yes, it can
log_without_hdmi.txt (86.6 KB)

There is not much to check on Orin display.

If you are sure the pinmux, dcb and device tree are all correct, then you can only check the hardware.

For example, this error log ā€œFailed detecting connected display devicesā€ is due to connected state of the HDMI monitor cannot be detected correctly.

Hi,

Thanks, I will check hardware with our EE

What kinds of issue maybe related?

(update)
Sorry for my mistake

That is a resume case

The reason is because we use B key nvme ssd for boot storage

But B key storage has no suspend clock pin for waking up

So it will stuck and it will resume rarely

1 Like