Display cause kernel panic during suspend

Hi Nvidia experts,

The kernel panic occur during suspend/resume “pc : nvInitFlipEvoHwState+0x30/0xf8 [nvidia_modeset]”.
wake-up by LTE:
Suspend_Panic_lte.txt (70.8 KB)
wake-up by USB:
Suspend_Panic_mouse.txt (91.2 KB)

1.We are using customer board with l4t-r35.4.1
This issue only happened in HDMI device, the Devkit(DP) can’t reproduce.

2.We have a panic log similar to the one discussed in this topic:

3.I found a easy way to reproduce this panic:
a. sudo systemctl suspend
b. wait for 3-4s
c. wake-up system
I write a script to reproduce this issue.

test_period=4
for i in {1..500}
do
    echo "Iteration: $i"
    sudo /usr/sbin/rtcwake -m no -s $test_period
    sudo systemctl suspend
    sleep $test_period
done

Setting “test_period=30” and test for 1000 times, the system didn’t panic.
Setting “test_period=4” and test, the panic rate: 5/25.
I suspect that a wake-up triggered at some point(3,4s after suspend) during the suspend process is causing this panic to occur.
Whether it is an LTE, USB, or RTC wake-up, this issue arises.

Best regards,
Andy

please refer to this post.

This issue will not fix on rel-35 but only on rel-36.

Modify the register 0x2212000 and disable the interrupt bit before doing suspend or shutdown as a workaround.

Hi WayneWWW,

  1. I modify the test script, but the panic still occur.
test_period=4
for i in {1..500}
do
    echo "Iteration: $i"
    sudo /usr/sbin/rtcwake -m no -s $test_period
    sudo busybox devmem 0x2212000 w 0x0D
    sudo systemctl suspend
    sleep $test_period
    sudo busybox devmem 0x2212000 w 0x4D
done
  1. About the post, the panic occur in: “pc : tegra186_gpio_irq+0x1a0/0x1e0”
    But in our case, the panic occur in: “pc : nvInitFlipEvoHwState+0x30/0xf8 [nvidia_modeset]”
    I don’t think the root cause of these two issues is the same.

  2. I noticed that before panic, nvidia has some error log, so I print function “nvRmApiControl()” retrun value.
    While the panic occur, the value is 26, waht does it means?

--- a/sources/nvdisplay/src/nvidia-modeset/src/nvkms-rm.c
+++ b/sources/nvdisplay/src/nvidia-modeset/src/nvkms-rm.c
@@ -1960,7 +1960,6 @@ NVDpyIdList nvRmGetConnectedDpys(const NVDispEvoRec *pDispEvo,
         (DRF_DEF(0073_CTRL_SYSTEM,_GET_CONNECT_STATE_FLAGS,_METHOD,_DEFAULT) |
          DRF_DEF(0073_CTRL_SYSTEM,_GET_CONNECT_STATE_FLAGS,_DDC,_DEFAULT) |
          DRF_DEF(0073_CTRL_SYSTEM,_GET_CONNECT_STATE_FLAGS,_LOAD,_DEFAULT));
     do {
         params.retryTimeMs = 0;
         ret = nvRmApiControl(nvEvoGlobal.clientHandle,
@@ -1984,6 +1983,7 @@ NVDpyIdList nvRmGetConnectedDpys(const NVDispEvoRec *pDispEvo,
         }
     } while(params.retryTimeMs > 0);
 
+        nvEvoLogDisp(pDispEvo, EVO_LOG_ERROR,"ret = %d", ret);
     if (ret == NVOS_STATUS_SUCCESS) {
         return nvNvU32ToDpyIdList(params.displayMask);
     } else {

[ 356.210679] nvidia-modeset: ERROR: GPU:0: ret = 26
[ 356.215659] nvidia-modeset: ERROR: GPU:0: Failed detecting connected display devices
[ 356.223696] nvidia-modeset: ERROR: GPU:0: Failure reading maximum pixel clock value for

[suspend_panic_rtcScript.txt|attachment](upload://y5BcTgK9BfqunFM8h9Rjf81QCDD.txt) (9.2 KB)

Oh sorry. Then they are different issue.

Could you upgrade to rel-36 to check if this issue is fixed? Rel-35 may not have further update in display driver anymore.

Hi WayneWWW,

The issue cannot be reproduced in version R36, but it does not seem to be related to the Nvidia display driver.
Previous experiments indicate that this issue only occurs when the suspend process is interrupted at a specific timing.
In version R36, the system cannot be awakened using USB, RTC, power key, or other methods before the suspend process is complete.

Do you know which patch implemented this feature?

I am not sure what you are talking about as nvidia_modeset is the display driver too.

Rel-36 RTC is a known issue that would be fixed in later release. Power key wake up should work at this moment.

What I mean is that the reason R36 does not encounter the nvidia_modeset panic is not because of any changes made to the Nvidia display driver. Instead, during my testing on version R36, I performed the following steps:

    sudo systemctl suspend
    Wait for 4 seconds
    Press the power key to wake up the system

Result: The system didn’t wake up.

From this test result, it appears that in R36, the system cannot be awakened using the power key before the suspend process is complete.
Therefore, the “nvidia_modeset panic that occurs when waking up the system after 4 seconds of suspend” cannot be reproduced in R36.

Are you sure your rel-36 is rel-36.3 GA but not rel-36.2 DP?

The version is R36.2

l@l-desktop:~$ dpkg-query --showformat='${Version}\n' --show nvidia-l4t-core
36.2.0-20231130105725
l@l-desktop:~$ cat /etc/nv_tegra_release 
# R36 (release), REVISION: 2.0, GCID: 34956989, BOARD: generic, EABI: aarch64, DATE: Thu Nov 30 19:03:58 UTC 2023
# KERNEL_VARIANT: oot
TARGET_USERSPACE_LIB_DIR=nvidia
TARGET_USERSPACE_LIB_DIR_PATH=usr/lib/aarch64-linux-gnu/nvidia

please upgrade the version… 36.3 is the GA version.
36.2 is the developer preview.

We only have environments for R35 and R36.2, and we have confirmed using R35.4.1 to develop our product.
Creating a new environment for R36.3 and porting all our modifications to R36.3 would take a lot of time, which is not very feasible.

I want to confirm that since the Nvidia display driver for R35 will no longer be updated, but this version of the driver has a risk of panic.
So, from Nvidia’s perspective, is suspend not supported in the R35 with HDMI version?

  1. Actually you shouldn’t use a developer preview for any error check for now. Which means your 36.2 does not really matter to current situation.

  2. It is just rel-35 is already in maintenance mode. New error found on this may not get fixed. Better upgrading to rel-36. We may only check new issues on rel-36.

I checked release note of R36.2 and R36.3.
4185596

Both R36.2 and R36.3
Jetson AGX Orin Developer Kit and Jetson AGX Industrial modules could
intermittently fail to resume after suspend.
R36.3
Waking up from Deep Sleep state (SC7) by USB events is not supported in the
NVIDIA JetPack 6.0 GA release for the Jetson Orin Series of products. This
functionality will be added in a future release.

It seems " disable USB wake" is the workaround of “resume failed” .
Would you mind tell me how to disabled USB wake feature in r35.4.1?

我直接用中文說明好了…

你講的東西好像沒什麼邏輯性… 不知道你最後這個comment想表示什麼
rel-36 release note的東西講的跟你最後的理解完全沒有關聯… 不知道你怎麼得出這個結論的…

R36.2開始的4185596, fail to resume after suspend
在R36.3的時候,同一個4185596停止support USB 喚醒功能
因為是同一條4185596,所以我推測"停止support USB 喚醒功能" 是 “fail to resume after suspend” 的work around.
所以我想知道r36.3是怎麼做到"停止support USB 喚醒功能", 我想將它套用到r35.4.1做測試

不是…完全不是你推測的那樣…

release note完全只是在說rel-36 的usb wake目前沒有支援而已… 它不是任何其他issue的workaround… 就只是沒有支援而已.

1.那你可以告訴我,為什麼它是同一條issue 4185596嗎?
2.以邏輯來說, R36.3肯定是做了什麼修改, 才會關閉usb 喚醒的功能
我們目前也希望關閉usb喚醒功能,你可以提供這方面的patch嗎?

…不是 我們並沒有把r36.3的usb wake關閉…

是這個東西從rel-36.2, rel-36.3都還沒有支援. 所以我們才特別寫一下這件事情… 並不是我們為了解決什麼狀況才特地去關掉usb wake…

你這個邏輯上說不通
在R35.4版,是可以用USB 喚醒的,在R36後變的不能喚醒,不就是這個功能被關閉了嗎?

另外, 你還是沒告訴我為什麼它們是同一條issue, 如果兩者沒有關聯,為什麼用同一條issue編號?是release note寫錯了嗎?