Hi NVIDIA Support Team,
We encountered an issue where the system freezes for a few seconds during a field trial with our customer.
Here’s how we replicated the issue:
• The screen blank and lock settings are configured to 5 minutes.
• When the timeout is reached, the screen blanks and locks automatically.
• The applications continue running but user is not interact with the GUI.
When we interact with the system, the screen turns on. At this moment, the system freezes for a few seconds, affecting our application and causing the machine to stop working temporarily. The issue occurs specifically when unlocking the screen, and the following error appears in the kernel log.
Mar 14 08:39:06 EAC6k-OrinNX kernel: ------------[ cut here ]------------
Mar 14 08:39:06 EAC6k-OrinNX kernel: hwirq = 76
Mar 14 08:39:06 EAC6k-OrinNX kernel: WARNING: CPU: 0 PID: 0 at drivers/gpio/gpio-tegra186.c:632 tegra186_gpio_irq+0x1ec/0x250
Mar 14 08:39:06 EAC6k-OrinNX kernel: Modules linked in: xt_conntrack(E) xt_MASQUERADE(E) ip6table_nat(E) ip6table_filter(E) ip6_tables(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) xt_addrtype(E) iptable_filter(E) nfnetlink(E) nvidia_modeset(OE) lzo_rle(E) lzo_compress(E) zram(E) zsmalloc(E) snd_soc_tegra186_asrc(OE) snd_soc_tegra186_arad(OE) snd_soc_tegra210_admaif(OE) snd_soc_tegra_pcm(E) snd_soc_tegra210_mixer(OE) snd_soc_tegra210_ope(OE) snd_soc_tegra210_mvc(OE) snd_soc_tegra210_afc(OE) snd_soc_tegra186_dspk(OE) snd_soc_tegra210_dmic(OE) snd_soc_tegra210_adx(OE) snd_soc_tegra210_sfc(OE) snd_soc_tegra210_amx(OE) snd_soc_tegra210_i2s(OE) tegra210_adma(E) snd_soc_tegra210_ahub(OE) spidev(E) nvvrs_pseq_rtc(OE) joydev(E) crct10dif_ce(E) tegra234_oc_event(OE) snd_soc_tegra_machine_driver(OE) snd_soc_tegra_utils(OE) snd_soc_simple_card_utils(E) rtc_ds1307(E) nvpmodel_clk_cap(OE) thermal_trip_event(OE) mttcan(OE) nvpps(OE) tegra_cactmon_mc_all(OE) tegra_aconnect(E)
Mar 14 08:39:06 EAC6k-OrinNX kernel: tegra234_aon(OE) can_dev(E) pwm_tegra_tachometer(OE) snd_hda_codec_hdmi(E) ramoops(E) snd_hda_tegra(E) reed_solomon(E) snd_hda_codec(E) snd_hda_core(E) spi_tegra114(E) mc_hwpm(OE) r8168(OE) nvidia(OE) tegra_pcie_dma_test(OE) nvidia_vrs_pseq(OE) tegra_pcie_edma(OE) host1x_fence(OE) tegra_dce(OE) tsecriscv(OE) rfkill(E) bridge(E) stp(E) llc(E) usb_f_ncm(E) nvhost_nvcsi_t194(OE) nvhost_isp5(OE) nvhost_vi5(OE) usb_f_mass_storage(E) vfat(E) fat(E) usb_f_acm(E) u_serial(E) usb_f_rndis(E) u_ether(E) libcomposite(E) tegra_camera(OE) nvhost_nvcsi(OE) tegra_camera_platform(OE) capture_ivc(OE) tegra_camera_rtcpu(OE) governor_userspace(E) ivc_bus(OE) hsp_mailbox_client(OE) ivc_ext(OE) v4l2_fwnode(E) v4l2_async(E) videobuf2_dma_contig(E) tegra_drm(OE) nvhost_pva(OE) videobuf2_memops(E) videobuf2_v4l2(E) tegra_wmark(OE) nvhost_capture(OE) videobuf2_common(E) nvhost_nvdla(OE) cec(E) nvhwpm(OE) drm_kms_helper(E) host1x_nvhost(OE) nvidia_p2p(OE) ina3221(E) nvgpu(OE) governor_pod_scaling(OE)
Mar 14 08:39:06 EAC6k-OrinNX kernel: host1x(OE) mc_utils(OE) nvmap(OE) nvsciipc(OE) drm(E) fuse(E) ip_tables(E) x_tables(E) ipv6(E) pwm_fan(E) pwm_tegra(E) tegra_bpmp_thermal(E) tegra_xudc(E) ucsi_ccg(E) typec_ucsi(E) typec(E) nvme(E) nvme_core(E) phy_tegra194_p2u(E) pcie_tegra194(E)
Mar 14 08:39:06 EAC6k-OrinNX kernel: CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W OE 5.15.136-tegra #1
Mar 14 08:39:06 EAC6k-OrinNX kernel: Hardware name: NVIDIA Vecow EAC-6000 Platform - NVIDIA Jetson Orin NX/Jetson, BIOS 36.3.0-gcid-36191598 05/06/2024
Mar 14 08:39:06 EAC6k-OrinNX kernel: pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=–)
Mar 14 08:39:06 EAC6k-OrinNX kernel: pc : tegra186_gpio_irq+0x1ec/0x250
Mar 14 08:39:06 EAC6k-OrinNX kernel: lr : tegra186_gpio_irq+0x1ec/0x250
Mar 14 08:39:06 EAC6k-OrinNX kernel: sp : ffff800008003f00
Mar 14 08:39:06 EAC6k-OrinNX kernel: x29: ffff800008003f00 x28: 000000000000004c x27: 0000000000000000
Mar 14 08:39:06 EAC6k-OrinNX kernel: x26: 0000000000000018 x25: ffffbfbfea405000 x24: ffff000083388880
Mar 14 08:39:06 EAC6k-OrinNX kernel: x23: ffff00008338d800 x22: 000000000000000c x21: 000000000000004c
Mar 14 08:39:06 EAC6k-OrinNX kernel: x20: 000000000000002e x19: ffffbfbfea405b80 x18: ffffffffffffffff
Mar 14 08:39:06 EAC6k-OrinNX kernel: x17: ffff404405468000 x16: ffff800008000000 x15: ffffbfbfeb7bd163
Mar 14 08:39:06 EAC6k-OrinNX kernel: x14: ffffffffffffffff x13: ffffbfbfeb7bd160 x12: 2d2d2d2d5d206572
Mar 14 08:39:06 EAC6k-OrinNX kernel: x11: 656820747563205b x10: ffffbfbfeb418fb0 x9 : 000000000000004c
Mar 14 08:39:06 EAC6k-OrinNX kernel: x8 : ffff800008003f00 x7 : 203d207172697768 x6 : 0000000000000030
Mar 14 08:39:06 EAC6k-OrinNX kernel: x5 : ffff0003f01f69f0 x4 : 00000000fffffe05 x3 : ffffbfbfeb438e70
Mar 14 08:39:06 EAC6k-OrinNX kernel: x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffbfbfeb3b2ec0
Mar 14 08:39:06 EAC6k-OrinNX kernel: Call trace:
Mar 14 08:39:06 EAC6k-OrinNX kernel: tegra186_gpio_irq+0x1ec/0x250
Mar 14 08:39:06 EAC6k-OrinNX kernel: handle_domain_irq+0x74/0xb0
Mar 14 08:39:06 EAC6k-OrinNX kernel: gic_handle_irq+0x68/0x150
Mar 14 08:39:06 EAC6k-OrinNX kernel: call_on_irq_stack+0x20/0x50
Mar 14 08:39:06 EAC6k-OrinNX kernel: do_interrupt_handler+0x70/0x80
Mar 14 08:39:06 EAC6k-OrinNX kernel: el1_interrupt+0x30/0x80
Mar 14 08:39:06 EAC6k-OrinNX kernel: el1h_64_irq_handler+0x18/0x30
Mar 14 08:39:06 EAC6k-OrinNX kernel: el1h_64_irq+0x7c/0x80
Mar 14 08:39:06 EAC6k-OrinNX kernel: cpuidle_enter_state+0xbc/0x3f0
Mar 14 08:39:06 EAC6k-OrinNX kernel: cpuidle_enter+0x44/0x60
Mar 14 08:39:06 EAC6k-OrinNX kernel: do_idle+0x220/0x2b0
Mar 14 08:39:06 EAC6k-OrinNX kernel: cpu_startup_entry+0x34/0x70
Mar 14 08:39:06 EAC6k-OrinNX kernel: rest_init+0xf0/0x100
Mar 14 08:39:06 EAC6k-OrinNX kernel: arch_call_rest_init+0x1c/0x28
Mar 14 08:39:06 EAC6k-OrinNX kernel: start_kernel+0x6d0/0x710
Mar 14 08:39:06 EAC6k-OrinNX kernel: __primary_switched+0xbc/0xc4
Mar 14 08:39:06 EAC6k-OrinNX kernel: —[ end trace a19b11b4917b8bad ]—
Mar 14 08:39:06 EAC6k-OrinNX /usr/libexec/gdm-x-session[2169]: (–) NVIDIA(GPU-0): Microstep MSI MP273 (DFP-0): connected
Mar 14 08:39:06 EAC6k-OrinNX /usr/libexec/gdm-x-session[2169]: (–) NVIDIA(GPU-0): Microstep MSI MP273 (DFP-0): Internal TMDS
Mar 14 08:39:06 EAC6k-OrinNX /usr/libexec/gdm-x-session[2169]: (–) NVIDIA(GPU-0): Microstep MSI MP273 (DFP-0): 600.0 MHz maximum pixel clock
Mar 14 08:39:06 EAC6k-OrinNX /usr/libexec/gdm-x-session[2169]: (–) NVIDIA(GPU-0):
Mar 14 08:39:06 EAC6k-OrinNX /usr/libexec/gdm-x-session[2169]: (–) NVIDIA(GPU-0): Microstep MSI MP273 (DFP-0): connected
Mar 14 08:39:06 EAC6k-OrinNX /usr/libexec/gdm-x-session[2169]: (–) NVIDIA(GPU-0): Microstep MSI MP273 (DFP-0): Internal TMDS
Mar 14 08:39:06 EAC6k-OrinNX /usr/libexec/gdm-x-session[2169]: (–) NVIDIA(GPU-0): Microstep MSI MP273 (DFP-0): 600.0 MHz maximum pixel clock
Mar 14 08:39:06 EAC6k-OrinNX /usr/libexec/gdm-x-session[2169]: (–) NVIDIA(GPU-0):
Is there any way to resolve this issue?
Note:
• Jetson Orin NX
• JetPack 6.0
Thanks!
• The applications continue running but user is not interact with the GUI.
What kind of application is this? If not running anything, will this issue reproduce?
Our customer runs an application in ROS that performs continuous computation based on real-time sensor data. When the system freezes, the algorithm’s output may become inaccurate or disrupted.
They have deployed 500 to 1,000 units. Please assist in finding a solution.
Please give out a proper setup to reproduce this issue on NV devkit. There is no fix just because you throw out a comment. We need local reproduce so that we can do analysis.
Hi Wayne,
We can share the application and setup files with you, but the packages were provided by our customers and cannot be made public. Could you help set up an email thread so that we can share them with you? My e-mail: roy.tsai@vecow.com
Thanks
Hi,
Just to clarify. Is it possible that you could try to narrow down this to some simplified method to reproduce on NV devkit?
Display things are quite sensitive to the drivers in use.
For example, if you just gave me a large setup then it may not be a good idea. We would never know what did you change on your side.
And are you sure the thing you are going to give out here is really able to run on NV devkit?
The application launches in Docker container, and that’s not any change in local side.
Please just answer these questions first
What kind of application is that thing? I notice you mentioned “continuous computation based on real-time sensor data”. So will it work if I don’t have any sensor on my board?
Does that application render anything to the display or not?
How frequently does this issue reproduce? Does it always happen or it is intermittent?
Ok. That’s sounds good.
Please send me how to set up this on NV devkit with private message of the forum system.
Hi @roy.tsai
I saw you have one steps in your private message
Check the logs stored in /proj/ipc_test/logs. If a file named {timestamp}_freeze.log is generated, it confirms that a system freeze occurred.
Why is this needed? Are you talking about this error will recover without doing any reboot?
Or are you checking this in next system reboot caused by freeze/hang?
如果用中文溝通比較有效率的話, 我想請問的是你最後一個步驟看起來是回頭去看你們test application自己存下來的log.
但我不懂的是如果系統已經freeze了, 照理來說我不用去看這些東西也知道系統有重啟過/或是系統卡住. 你這個步驟是為了事後回頭確認系統發生了什麼事情嗎? 還是說這個freeze是短暫發生但又會自動復原?
Freeze會短暫發生且自動復原繼續,log會紀錄Freeze發生的時間並抓出當下的kernel訊息
we are not able to reproduce this issue after testing for one day.
Please let us know if you have anything that could easily trigger the error.
這現象是隨機發生,問題並不好復現。從前面kernel log來看是由hwirq = 76造成,請問是否可以將此IRQ Disable? 會有什麼影響嗎?
kernel log:
Mar 14 08:39:06 EAC6k-OrinNX kernel: hwirq = 76
Mar 14 08:39:06 EAC6k-OrinNX kernel: Call trace:
Mar 14 08:39:06 EAC6k-OrinNX kernel: tegra186_gpio_irq+0x1ec/0x250
Mar 14 08:39:06 EAC6k-OrinNX kernel: handle_domain_irq+0x74/0xb0
Mar 14 08:39:06 EAC6k-OrinNX kernel: gic_handle_irq+0x68/0x150
Mar 14 08:39:06 EAC6k-OrinNX kernel: call_on_irq_stack+0x20/0x50
Mar 14 08:39:06 EAC6k-OrinNX kernel: do_interrupt_handler+0x70/0x80
Mar 14 08:39:06 EAC6k-OrinNX kernel: el1_interrupt+0x30/0x80
Mar 14 08:39:06 EAC6k-OrinNX kernel: el1h_64_irq_handler+0x18/0x30
Mar 14 08:39:06 EAC6k-OrinNX kernel: el1h_64_irq+0x7c/0x80
Mar 14 08:39:06 EAC6k-OrinNX kernel: cpuidle_enter_state+0xbc/0x3f0
Mar 14 08:39:06 EAC6k-OrinNX kernel: cpuidle_enter+0x44/0x60
Mar 14 08:39:06 EAC6k-OrinNX kernel: do_idle+0x220/0x2b0
Mar 14 08:39:06 EAC6k-OrinNX kernel: cpu_startup_entry+0x34/0x70
Mar 14 08:39:06 EAC6k-OrinNX kernel: rest_init+0xf0/0x100
Mar 14 08:39:06 EAC6k-OrinNX kernel: arch_call_rest_init+0x1c/0x28
Mar 14 08:39:06 EAC6k-OrinNX kernel: start_kernel+0x6d0/0x710
Mar 14 08:39:06 EAC6k-OrinNX kernel: __primary_switched+0xbc/0xc4
Mar 14 08:39:06 EAC6k-OrinNX kernel: —[ end trace a19b11b4917b8bad ]—
對應的tegra234.dtsi:
gpcdma: dma-controller@2600000 {
compatible = "nvidia,tegra234-gpcdma",
"nvidia,tegra186-gpcdma";
reg = <0x0 0x2600000 0x0 0x210000>;
resets = <&bpmp TEGRA234_RESET_GPCDMA>;
reset-names = "gpcdma";
interrupts = <GIC_SPI 75 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 76 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 77 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 78 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 79 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 80 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 81 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 82 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 83 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 84 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 85 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 86 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 87 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 88 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 89 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 90 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 91 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 92 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 93 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 94 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 95 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 96 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 97 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 98 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 99 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 100 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 101 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 102 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 103 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 104 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 105 IRQ_TYPE_LEVEL_HIGH>,
<GIC_SPI 106 IRQ_TYPE_LEVEL_HIGH>;
#dma-cells = <1>;
iommus = <&smmu_niso0 TEGRA234_SID_GPCDMA>;
dma-channel-mask = <0xfffffffe>;
dma-coherent;
};
能否請你加上這個patch確認一下這個hwirq是從誰來的?
Hi,
Sorry that I just notice you could directly print the port->name variables.
--- a/drivers/gpio/gpio-tegra186.c
+++ b/drivers/gpio/gpio-tegra186.c
@@ -652,6 +652,7 @@
value = readl(base + TEGRA186_GPIO_INTERRUPT_STATUS(1));
for_each_set_bit(pin, &value, port->pins) {
+ pr_err("GPIO IRQ triggered: Port %s, Pin %d\n", port->name, pin);
int ret = generic_handle_domain_irq(domain, offset + pin);
WARN_RATELIMIT(ret, "hwirq = %d", offset + pin);
}
我猜是HDMI HPD, 不過還是確認一下
Hi Wanye,
結果如下:
Mar 25 03:41:27 OrinNX kernel: GPIO IRQ triggered: Port M, Pin 0
Mar 25 03:41:27 OrinNX kernel: ------------[ cut here ]------------
Mar 25 03:41:27 OrinNX kernel: hwirq = 76
Mar 25 03:41:27 OrinNX kernel: WARNING: CPU: 0 PID: 0 at drivers/gpio/gpio-tegra186.c:633 tegra186_gpio_irq+0x224/0x260
Mar 25 03:41:27 OrinNX kernel: Modules linked in: xt_conntrack(E) xt_MASQUERADE(E) ip6table_nat(E) ip6table_filter(E) ip6_tables(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) xt_addrtype(E) iptable_filter(E) nfnetlink(E) nvidia_modeset(OE) lzo_rle(E) lzo_compress(E) zram(E) zsmalloc(E) snd_soc_tegra186_asrc(OE) snd_soc_tegra210_admaif(OE) snd_soc_tegra210_mixer(OE) snd_soc_tegra_pcm(E) snd_soc_tegra186_arad(OE) snd_soc_tegra210_ope(OE) snd_soc_tegra210_afc(OE) snd_soc_tegra210_mvc(OE) snd_soc_tegra186_dspk(OE) snd_soc_tegra210_dmic(OE) snd_soc_tegra210_adx(OE) snd_soc_tegra210_amx(OE) snd_soc_tegra210_sfc(OE) snd_soc_tegra210_i2s(OE) snd_soc_tegra210_ahub(OE) tegra210_adma(E) spidev(E) nvvrs_pseq_rtc(OE) joydev(E) snd_soc_tegra_machine_driver(OE) crct10dif_ce(E) snd_soc_tegra_utils(OE) snd_soc_simple_card_utils(E) tegra234_oc_event(OE) nvpmodel_clk_cap(OE) tegra_cactmon_mc_all(OE) thermal_trip_event(OE) mttcan(OE) nvpps(OE) tegra234_aon(OE) rtc_ds1307(E)
Mar 25 03:41:27 OrinNX kernel: tegra_aconnect(E) can_dev(E) ramoops(E) reed_solomon(E) snd_hda_codec_hdmi(E) pwm_tegra_tachometer(OE) snd_hda_tegra(E) snd_hda_codec(E) snd_hda_core(E) r8168(OE) spi_tegra114(E) nvidia(OE) tegra_pcie_dma_test(OE) tegra_pcie_edma(OE) mc_hwpm(OE) nvidia_vrs_pseq(OE) host1x_fence(OE) tegra_dce(OE) tsecriscv(OE) rfkill(E) bridge(E) stp(E) llc(E) usb_f_ncm(E) usb_f_mass_storage(E) vfat(E) nvhost_isp5(OE) nvhost_vi5(OE) nvhost_nvcsi_t194(OE) fat(E) usb_f_acm(E) u_serial(E) usb_f_rndis(E) u_ether(E) libcomposite(E) tegra_camera(OE) nvhost_nvcsi(OE) tegra_camera_platform(OE) capture_ivc(OE) tegra_camera_rtcpu(OE) ivc_bus(OE) hsp_mailbox_client(OE) governor_userspace(E) ivc_ext(OE) v4l2_fwnode(E) tegra_drm(OE) v4l2_async(E) videobuf2_dma_contig(E) videobuf2_memops(E) nvhost_pva(OE) videobuf2_v4l2(E) nvhost_nvdla(OE) tegra_wmark(OE) nvhwpm(OE) nvhost_capture(OE) cec(E) videobuf2_common(E) host1x_nvhost(OE) drm_kms_helper(E) nvidia_p2p(OE) ina3221(E) nvgpu(OE)
Mar 25 03:41:27 OrinNX kernel: governor_pod_scaling(OE) host1x(OE) mc_utils(OE) nvmap(OE) nvsciipc(OE) drm(E) fuse(E) ip_tables(E) x_tables(E) ipv6(E) pwm_fan(E) pwm_tegra(E) tegra_bpmp_thermal(E) tegra_xudc(E) ucsi_ccg(E) typec_ucsi(E) typec(E) nvme(E) nvme_core(E) phy_tegra194_p2u(E) pcie_tegra194(E)
Mar 25 03:41:27 OrinNX kernel: CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W OE 5.15.136-tegra #4
Mar 25 03:41:27 OrinNX kernel: Hardware name: NVIDIA Vecow EAC-6000 Platform - NVIDIA Jetson Orin NX/Jetson, BIOS 36.3.0-gcid-36191598 05/06/2024
Mar 25 03:41:27 OrinNX kernel: pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
Mar 25 03:41:27 OrinNX kernel: pc : tegra186_gpio_irq+0x224/0x260
Mar 25 03:41:27 OrinNX kernel: lr : tegra186_gpio_irq+0x224/0x260
Mar 25 03:41:27 OrinNX kernel: sp : ffff800008003f00
Mar 25 03:41:27 OrinNX kernel: x29: ffff800008003f00 x28: 0000000000000018 x27: 000000000000004c
Mar 25 03:41:27 OrinNX kernel: x26: ffff000084ed7080 x25: 000000000000000c x24: ffffdadb03595b80
Mar 25 03:41:27 OrinNX kernel: x23: 000000000000002e x22: ffff000084ed3000 x21: ffff000080af5660
Mar 25 03:41:27 OrinNX kernel: x20: ffffdadb046e50a0 x19: ffffdadb0465d470 x18: ffffffffffffffff
Mar 25 03:41:27 OrinNX kernel: x17: ffff2528ec2d6000 x16: ffff800008000000 x15: ffffdadb049582b3
Mar 25 03:41:27 OrinNX kernel: x14: ffffffffffffffff x13: ffffdadb049582b0 x12: 2d2d2d2d5d206572
Mar 25 03:41:27 OrinNX kernel: x11: 656820747563205b x10: ffffdadb04578570 x9 : 000000000000004c
Mar 25 03:41:27 OrinNX kernel: x8 : ffff800008003f00 x7 : 203d207172697768 x6 : 0000000000000030
Mar 25 03:41:27 OrinNX kernel: x5 : ffff0003f01f49f0 x4 : 00000000fffff52d x3 : ffffdadb045bba30
Mar 25 03:41:27 OrinNX kernel: x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffdadb04542ec0
Mar 25 03:41:27 OrinNX kernel: Call trace:
Mar 25 03:41:27 OrinNX kernel: tegra186_gpio_irq+0x224/0x260
Mar 25 03:41:27 OrinNX kernel: handle_domain_irq+0x74/0xb0
Mar 25 03:41:27 OrinNX kernel: gic_handle_irq+0x68/0x150
Mar 25 03:41:27 OrinNX kernel: call_on_irq_stack+0x20/0x50
Mar 25 03:41:27 OrinNX kernel: do_interrupt_handler+0x70/0x80
Mar 25 03:41:27 OrinNX kernel: el1_interrupt+0x30/0x80
Mar 25 03:41:27 OrinNX kernel: el1h_64_irq_handler+0x18/0x30
Mar 25 03:41:27 OrinNX kernel: el1h_64_irq+0x7c/0x80
Mar 25 03:41:27 OrinNX kernel: cpuidle_enter_state+0xbc/0x3f0
Mar 25 03:41:27 OrinNX kernel: cpuidle_enter+0x44/0x60
Mar 25 03:41:27 OrinNX kernel: do_idle+0x220/0x2b0
Mar 25 03:41:27 OrinNX kernel: cpu_startup_entry+0x34/0x70
Mar 25 03:41:27 OrinNX kernel: rest_init+0xf0/0x100
Mar 25 03:41:27 OrinNX kernel: arch_call_rest_init+0x1c/0x28
Mar 25 03:41:27 OrinNX kernel: start_kernel+0x6d0/0x710
Mar 25 03:41:27 OrinNX kernel: __primary_switched+0xbc/0xc4
Mar 25 03:41:27 OrinNX kernel: ---[ end trace 26587eb165203149 ]---
Mar 25 03:41:27 OrinNX /usr/libexec/gdm-x-session[2141]: (EE) client bug: timer event6 debounce: scheduled expiry is in the past (-1184ms), your system is too slow
Mar 25 03:41:27 OrinNX /usr/libexec/gdm-x-session[2141]: (EE) client bug: timer event6 debounce: scheduled expiry is in the past (-1006ms), your system is too slow
Mar 25 03:41:27 OrinNX /usr/libexec/gdm-x-session[2141]: (EE) client bug: timer event6 debounce short: scheduled expiry is in the past (-1019ms), your system is too slow
Mar 25 03:41:27 OrinNX /usr/libexec/gdm-x-session[2141]: (EE) WARNING: log rate limit exceeded (5 msgs per 3600000ms). Discarding future messages.
Mar 25 03:41:27 OrinNX /usr/libexec/gdm-x-session[2141]: (--) NVIDIA(GPU-0): DFP-0: disconnected
Mar 25 03:41:27 OrinNX /usr/libexec/gdm-x-session[2141]: (--) NVIDIA(GPU-0): DFP-0: Internal TMDS
Mar 25 03:41:27 OrinNX /usr/libexec/gdm-x-session[2141]: (--) NVIDIA(GPU-0): DFP-0: 165.0 MHz maximum pixel clock
Mar 25 03:41:27 OrinNX /usr/libexec/gdm-x-session[2141]: (--) NVIDIA(GPU-0):
Mar 25 03:41:27 OrinNX /usr/libexec/gdm-x-session[2141]: (--) NVIDIA(GPU-0): DFP-0: disconnected
Mar 25 03:41:27 OrinNX /usr/libexec/gdm-x-session[2141]: (--) NVIDIA(GPU-0): DFP-0: Internal TMDS
Mar 25 03:41:27 OrinNX /usr/libexec/gdm-x-session[2141]: (--) NVIDIA(GPU-0): DFP-0: 165.0 MHz maximum pixel clock
Mar 25 03:41:27 OrinNX /usr/libexec/gdm-x-session[2141]: (--) NVIDIA(GPU-0):
這個GPIO來自下面位置,所以問題會是螢幕點亮時,偵測螢幕造成系統短暫凍結嗎?
display@13800000 {
/* os_gpio_hotplug_a is used for hotplug */
os_gpio_hotplug_a = <&gpio TEGRA234_MAIN_GPIO(M, 0) GPIO_ACTIVE_HIGH>;
status = "okay";
};
這部份我也想跟你們確認一下.
依據你前面複製問題的手法, 這串error發生的時候是不是螢幕還沒有亮回去? 還是power saving的狀態?
因為要查看log才能知道有無法發生凍結,時間點上不容易確定,我這邊會嘗試用遠端觀察。如果移除HDMI線驗證會有幫助嗎?
如果移除HDMI線之後應該是完全不會發生這問題.
我看到比較奇怪的地方是你們那個3/25 3:41的log裡面有這一段
Mar 25 03:41:27 EAC6k-OrinNX /usr/libexec/gdm-x-session[2141]: (--) NVIDIA(GPU-0): DFP-0: disconnected
Mar 25 03:41:27 EAC6k-OrinNX /usr/libexec/gdm-x-session[2141]: (--) NVIDIA(GPU-0): DFP-0: Internal TMDS
Mar 25 03:41:27 EAC6k-OrinNX /usr/libexec/gdm-x-session[2141]: (--) NVIDIA(GPU-0): DFP-0: 165.0 MHz maximum pixel clock
Mar 25 03:41:27 EAC6k-OrinNX /usr/libexec/gdm-x-session[2141]: (--) NVIDIA(GPU-0):
Mar 25 03:41:27 EAC6k-OrinNX /usr/libexec/gdm-x-session[2141]: (--) NVIDIA(GPU-0): DFP-0: disconnected
Mar 25 03:41:27 EAC6k-OrinNX /usr/libexec/gdm-x-session[2141]: (--) NVIDIA(GPU-0): DFP-0: Internal TMDS
Mar 25 03:41:27 EAC6k-OrinNX /usr/libexec/gdm-x-session[2141]: (--) NVIDIA(GPU-0): DFP-0: 165.0 MHz maximum pixel clock
Mar 25 03:41:27 EAC6k-OrinNX /usr/libexec/gdm-x-session[2141]: (--) NVIDIA(GPU-0):
我之前並沒有去分析過螢幕進入power saving之後的HDMI連接狀態到底是接上還是斷開的狀態.
從你給的log上來看, HDMI貌似是斷掉的狀態.
但問題就變成是只有這個螢幕的power saving會造成這個狀態, 還是任何螢幕的power saving都會這樣.
以前tegra186_gpio_irq通常會發生在GPIO還是有interrupt進來但已經沒有對應的driver在clear interrupt.
常見的scenario有系統進suspend或power down的階段
但我們在rel-36還沒有碰過 所以這個問題需要確認一下打中問題時確切的情境