WARNING occurs when plugin/out HDMI in OrinNX

Hi @weileng

Sorry that I can only discuss this issue with devkit user. I saw this issue happened on some custom boards before, but turned out this is their hardware design issue.

Thus, I need you to make sure whether devkit can reproduce this issue with same monitor.

Also, could you check if the “power save” of the monitor is related to this issue?

This is also something observed by other end users too.

For example, your issue happened after 200 second after boot up. Looks related to something triggered, maybe also the power save?

Hi @WayneWWW

Thanks for your replay. Could you please elaborate “power save” related issue?
two questions please give your comments, thanks.
#1 HDMI hotplug trigger kernel warning “tegra186_gpio_irq+0x1ac/0x1f0”, Is this state normal?
#2 What case will trigger kernel warning “tegra186_gpio_irq+0x1ac/0x1f0” on Orin NX?
The same monitor and the same motherboard with Xavier NX 8G/16G SOM has NO such issue.

Hi @weileng

  1. If the hotplug case didn’t lead to kernel panic but just 1~2 times spew, then it is not an error.

  2. So far what I can see, there are few cases.
    → One user said when the HDMI monitor enters powe save (screen goes blank due to inactive), then it would cause this error
    → Another user’s custom board will hit issue when resume from sc7 suspend mode
    → Another user said some of his/her monitors would hit this issue when keep running sudo reboot multiple times.
    Some of them said this issue could be reproduced with NV devkit (p3509 XNX devkit) but must use specific monitors. Not every monitor can reproduce this issue. That was why I cannot check this for now.

  3. There is no need to compare the XNX case with Orin. The driver structure is totally different and none of the previous jetson (TX1/TX2/Xavier) ever uses current display driver. I am not saying it has no issue. I mean the comparison does not mean anything. What we need to do here is focus on how to reproduce this issue on devkit. What setup… How to trigger.

Thank you Wayne!

If the hotplug case didn’t lead to kernel panic but just 1~2 times spew, then it is not an error.

My test bed on Orin NX 16G SOM

  1. during HDMI hotplug, it will trigger 1~2 times “tegra186_gpio_irq+0x1ac/0x1f0” kernel error warning, which won’t lead to kernel panic.
  2. System auto blank screen when Power saving( 5 minutes setting by default) will trigger 1~2 times “tegra186_gpio_irq+0x1ac/0x1f0” kernel error warning, which won’t lead to kernel panic.
  3. System reboot under UI won’t trigger “tegra186_gpio_irq+0x1ac/0x1f0” kernel error warning
  4. The corner case will trgger GPIO warning flush is that:
    Re-flash Orin NX image and the first time to re-enter system UI after system configuation, sometimes it will happen.

After I got devkit, I will retest it on Orin NX with this type 4K monitor.

1 Like

Hi ,
We have same kernel dump message on AGX Orin HDMI hot-plug.
We do modify pinmux dtis, gpio dtsi in Linux_for_Tegra/bootloader relative file, change dcb file to hdmi, and also have os_gpio_hotplug in devicetree file.
display@13800000 {
status = “okay”;
os_gpio_hotplug_a = <&tegra_main_gpio TEGRA234_MAIN_GPIO(M, 0) GPIO_ACTIVE_HIGH>;
};

Also, after system boot-up, we can not find gpio M,0 usage status in /sys/kernel/debug/gpio.
Try to export gpio M,0 to /sys/class/gpio/export , then kernel dump message disappear.

By tracing nvidia display driver, we found gpio request may not used?? (We add debug message in nvidia display driver)

Do nvidia guys any ideas?

Hi,

Which display driver are you tracing here?

Also, after you export M,0 in gpio sysfs, is your HDMI hotplug still working fine?

Hi WayneWWW,

yes, after export M,0 in gpio sysfs, HDMI hotplut still working fine.
The display driver is NVIDIA-kernel-module-source-TempVersion .

ps: to export M,0 , then unexport M,0 , kernel dump and error message disappear too. and hdmi hot-plug working fine.

Regards

@WayneWWW

But the warning will not cause restart or hang on devkit. Please file a new topic for your issue.

Hi, After long investigation and aging,
We reproduced that warning cause system hang/kernel panic in OrinNX 8GB + XavierNX EVK environment.
Reproduce rate is 100%.

Step 1. Connect [OrinNX 8GB + XavierNX EVK] and Display with HDMI Cable.
Step 2. Connect [OrinNX 8GB + XavierNX EVK] and HostPC with UART Cable for console login.
Step 3  Power ON Display and [OrinNX 8GB + XavierNX EVK]
Step 3. Confirm that Ubuntu login prompt appeas on Display.
Step 4. Login from console, and stop gdm by following command.

# systemctl stop gdm

Step 5. Unplug HDMI Cable from XavierNX EVK HDMI port.

Step 6. Reproduce kernel WARNING releatedly and cause kernel panic.

[   61.878407] WARNING: CPU: 0 PID: 0 at drivers/gpio/gpio-tegra186.c:937 tegra186_gpio_irq+0x1ac/0x1f0
[   61.888295] ---[ end trace b6b8c2c9494bf98d ]---
[   61.893129] WARNING: CPU: 0 PID: 0 at drivers/gpio/gpio-tegra186.c:937 tegra186_gpio_irq+0x1ac/0x1f0
[   61.902842] ---[ end trace b6b8c2c9494bf98e ]---
[   61.907635] WARNING: CPU: 0 PID: 0 at drivers/gpio/gpio-tegra186.c:937 tegra186_gpio_irq+0x1ac/0x1f0
[   61.917320] ---[ end trace b6b8c2c9494bf98f ]---
...

We checked following 3 display and result is same.

・LG electronics 23M45D-B
・ViewSonic VA2719-2K-SMHD-7
・ViewSonic VX2882-4KP

Same situation is reproduced by CustomBoard only pluggig/unplugging HDMI Cable.
I think root cause is same with this problem.

This is not HW design issue.
Please reproduce and fix by internal team.

20230601_CPU_OrinNX8GB_Xavier_EVK_HDMI_warning_hang_log_LG_23M45.txt (625.2 KB)

For all who are reading this post.
I just want to share some points and explain something here.

  1. this tegra186_gpio_irq is a common warning that could be generated when the hotplug interrupt has some abnormal handle. When too many abnormal interrupt comes, it will lead to kernel panic.

  2. Based on (1), actually you cannot take everything that generated “tegra186_gpio_irq” as same issue. My point here is the root cause of “tegra186_gpio_irq” could be various. Fixing one case does not mean you will be fine in another case.

For example, we see some customer’s board indeed has hardware design problem so it will lead to tegra186_gpio_irq when enter suspend mode. The root cause of such issue is the power sequence. And this case won’t happen on devkit.

Another example here is @shinichiro.adachi tried to reproduce this issue by disabling gdm.
Honestly, I don’t think disabling gdm is a useful case here as Orin does not support framebuffer console yet. I don’t think your real usecase will be same steps as what you shared either.

Thus, back to what I want to say here. Please do not wasting your time just trying to create a scenario that can print tegra186_gpio_irq on devkit and use that to persuade me this is NV issue. If your usecase does not make sense, then it does not worth checking. Our internal team will still ignore what you want us to check.
More importantly, it may not be same issue as what you saw on your custom board.

As for how to check this issue

  1. If you can reproduce this issue on devkit with a standard usecase, please help check if this is something that could only be reproduced on specific monitors. If it is, please help probe the signal of HDMI_EN, HPD, DDC DAT and CLK signal as I don’t have the same monitor as your case.

  2. If this issue could not reproduce on devkit, and only happened on your custom board, then please again, help probe HDMI_EN, HPD, DDC DAT and CLK signal as I don’t have your board.

@shinichiro.adachi. Sorry that what you are doing here is not logical. Our internal team will not check what you told here.
If you still don’t understand what I mean, I can explain more.

I understand that tegra186_gpio_irq can be caused by many factors.
I understand that you want to separate the various issues.

This Plugin/Out issue may not be related to the irq issue that occurs when entering suspend mode.

However, I think this Plugin/Out issue is related to the irq issue with stopping gdm and unplugging the HDMI cable.

The reason is that the trigger is ultimately triggered by unplugging the HDMI cable.
Also, This problem is already reproduced in EVK, not CustomBoard.

I understand that it is difficult to check and debug each customer’s HW design individually.

I think xavier nx carrier board may not hit this issue. But I need double confirm on your side, so please do this test on this side as well.

What we need to do here is focus on how to reproduce this issue on devkit. What setup… How to trigger.

Sorry that what you are doing here is not logical.

You said, “What we need to do here is focus on how to reproduce this issue on devkit.”

Therefore, at your own request, We purchased EVK, Spend time and money, finally found a way to reproduce it with EVK.

And if EVK finally has a problem, will they further ask the user to Debug it?
You don’t even try to see if it occurs in your Display?

If it occurs in EVK, NVIDIA should be a little more proactive and take the initiative to resolve it.

I don’t think disabling gdm is an abonormal action.

For example, if the user wanted to use EVK in clamshell mode, would disable gdm and unplug the HDMI cable.

Hi,

I am okay to check gdm case. Hope you didn’t get misunderstood.

But you just need to know.

  1. Even we resolve anything on our side, it didn’t mean your case on your board will be resolved.
    My point is, I don’t want to waste your time waiting for me to resolve some usecase that could be totally useless on your side. For example, we spend another month to resolving this issue with gdm, but turns out this totally cannot fix your issue on your board.
    We would still need you to dump hardware signal on your board then. If so, why not just dump the hardware signal from your board right now?

  2. Also, we are working on enabling framebuffer console feature for now. Thus, lots of changes will be merged at this time and this is related to your “disable gdm” case. It is pointless to check gdm issue for now. We may check it in next release, but again, your issue may not have progress here when you wait for us enabling framebuffer console.

Hope you really understand what I am talking about. I am trying to resolve an issue that really matters to your case. Not some pointless usecase.

I was able to reproduce the panic loop in EVK under the same environment as before.

The difference is using EVK, not CustomBoard.

We use Xavier NX devkit as carrier board + Orin NX 8GB.
Reproduce rate is 100%.

1. Reflash EVK+SSD Firmware from HostPC by microUSB port.

2. Boot EVK without connecting HDMI cable.

3. Insert HDMI cable when the following message appears on the serial console.

[   17.783565] Please complete system configuration setup on the serial port provided 
by Jetson's USB device mode connection. e.g. /dev/ttyACMx where x can 0, 1, 2 etc.

4. Reproduce kernel WARNING repeatedly and cause kernel panic.

This is identical situation I saw when I first posted this POST.
Therefore, we have not found any new cases of WARNING.

I wonder If HPD interrupt occurs while no GUI process such as gdm is running, a panic loop may occur.

20230607_CPU_OrinNX8GB_Xavier_EVK_HDMI_warning_hang_log_before_setup.txt (590.7 KB)

Hi @shinichiro.adachi ,

Yes, I can help you check this issue. But just a reminder, fixing this on our side may not fix any case on yours if your usecase is not same.

This is identical situation I saw when I first posted this POST.

Also another reminder, none of the log from you ever proved they are same issue…
I already explained this multiple times. Hope you really understand it.

Hi,
We will try to replicate the issue and check further. The timing looks tricky. As a solution, you can connect the cable when the system is in recovery mode before flashing.

@WayneWWW @DaneLLL
We have encountered the same problem in some bad quality monitors (but the monitor manufacturers don’t recognize the bad quality), and it is easy to have a crash problem due to this issue, which eventually triggers the watchdog and causes the system to reboot. Hopefully this will be resolved soon.

@baozhu.zuo we are still doing internal check for this issue.

Btw, what is your scenario to trigger this case? I mean does it just got triggered after doing hotplug or it needs specific timing to trigger( e.g. shutdown/reboot/suspend)?

@WayneWWW It’s easier to reproduce by not plugging in HDMI during startup, and plugging in HDMI after startup, but it’s not 100%. The following steps remain the most effective if reproduction is to be stabilized. The kernel error log is the same as this one when we have an exception.

Hi @baozhu.zuo

Just to clarify. That method does not need a bad quality monitor to reproduce.

I just need to know what is your situation to reproduce issue with bad quality monitor. You don’t need to tell me how other people reproduce this situation. I only need to know your case.

So your case is just hotplug the monitor after boot up? I mean even after you configure the user account?

@WayneWWW Yes, after configuring the user. Power up is without HDMI plugged in, about 2 minutes after power up with HDMI plugged in.