Issue with PCIe Driver Crash During Soft Reboot (Xavier AGX)

Hi,

We are encountering the same issue discussed in this thread: PCie Driver crash issue while soft reboot

After checking the FPGA side, we confirmed that there are no pending interrupts. We connected the reset signal (H10 pin - PCIe reset on Xavier AGX) directly to the PCIe controller in the FPGA and used it to reset the core without any additional logic.

However, during a soft reboot ($ sudo reboot), the Xavier AGX system hangs, and we require a full power reset each time to recover.

Any support or guidance would be greatly appreciated, as it would help us proceed with our future implementation.

Is there any serial log to share out first?

Hi @WayneWWW,

Thanks for the response. Getting the below crash log.


[ 28.452347] Call trace:
[ 28.452397] [] dump_backtrace+0x0/0x198
[ 28.452489] [] show_stack+0x24/0x30
[ 28.452577] [] dump_stack+0xa0/0xc4
[ 28.452667] [] panic+0x12c/0x2a8
[ 28.452754] [] watchdog_unpark_threads+0x0/0x98
[ 28.452859] [] __hrtimer_run_queues+0xd8/0x360
[ 28.452959] [] hrtimer_interrupt+0xa8/0x1e0
[ 28.453167] [] arch_timer_handler_phys+0x38/0x58
[ 28.453645] [] handle_percpu_devid_irq+0x90/0x2b0
[ 28.454118] [] generic_handle_irq+0x34/0x50
[ 28.454539] [] __handle_domain_irq+0x68/0xc0
[ 28.457167] [] gic_handle_irq+0x5c/0xb0
[ 28.462505] [] el1_irq+0xe8/0x194
[ 28.467150] [] irq_exit+0xd0/0x118
[ 28.471961] [] __handle_domain_irq+0x6c/0xc0
[ 28.478080] [] gic_handle_irq+0x5c/0xb0
[ 28.483161] [] el1_irq+0xe8/0x194
[ 28.488235] [] dma_alloc_from_coherent_attr+0x0/0x168
[ 28.494801] [] tegra_pcie_dw_host_init+0x858/0xb28
[ 28.501359] [] dw_pcie_host_init+0x230/0x530
[ 28.507308] [] tegra_pcie_dw_runtime_resume+0x1bc/0x370
[ 28.514396] [] pm_generic_runtime_resume+0x3c/0x58
[ 28.520697] [] __genpd_runtime_resume+0x38/0xa0
[ 28.527083] [] genpd_runtime_resume+0xa4/0x210
[ 28.533031] [] __rpm_callback+0x74/0xa0
[ 28.538456] [] rpm_callback+0x34/0x98
[ 28.543705] [] rpm_resume+0x470/0x710
[ 28.548780] [] __pm_runtime_resume+0x4c/0x70
[ 28.554644] [] tegra_pcie_dw_probe+0x8d8/0xbb0
[ 28.560428] [] platform_drv_probe+0x60/0xc0
[ 28.566024] [] driver_probe_device+0x298/0x448
[ 28.572143] [] __driver_attach+0xdc/0x128
[ 28.577743] [] bus_for_each_dev+0x5c/0xa8
[ 28.583518] [] driver_attach+0x30/0x40
[ 28.588601] [] bus_add_driver+0x20c/0x2a8
[ 28.594198] [] driver_register+0x6c/0x110
[ 28.600232] [] __platform_driver_register+0x5c/0x68
[ 28.606450] [] tegra_pcie_rp_init+0x18/0x20
[ 28.612221] [] do_one_initcall+0x44/0x130
[ 28.618257] [] kernel_init_freeable+0x1a0/0x244
[ 28.624121] [] kernel_init+0x18/0x108
[ 28.629373] [] ret_from_fork+0x10/0x30
[ 28.635147] SMP: stopping secondary CPUs
[ 28.639087] Kernel Offset: disabled
[ 28.642753] Memory Limit: none

Have you tried to remove val |= APPL_INTR_EN_L1_8_INTX_EN?

Hello @WayneWWW ,

We have successfully applied the patch and conducted tests. Below are our observations:

When booting with the primary kernel image (which includes the patch), the system boots without any issues. However, when booting with the secondary kernel image (without the patch, and without running the application, meaning no interrupts from the FPGA), the boot process gets stuck.

Could you kindly provide a detailed explanation of the usage of the flag APPL_INTR_EN_L1_8_INTX_EN?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.