PCie Driver crash issue while soft reboot

Hai,

Currently we are working on Nvidia AGX Xavier with MPSoC connected via PCIe. We have developed a V4l2 based video capture driver and feed the input video from MpSoC.

For testing purpose, we have developed one video capture application based on OpenGl and it was running fine.

After running the OpenGl code, we have rebooted the boot and getting below error,

[ 28.451940] Kernel panic - not syncing: softlockup: hung tasks
[ 28.452049] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G L 4.9.253-t
egra #1
[ 28.452178] Hardware name: DTS V4.6.05, * JetPack_4.6 Emb * Ethernet with F
ixed Link 1Gbps * card Jetson-AGX (DT)
[ 28.452347] Call trace:
[ 28.452397] [] dump_backtrace+0x0/0x198
[ 28.452489] [] show_stack+0x24/0x30
[ 28.452577] [] dump_stack+0xa0/0xc4
[ 28.452667] [] panic+0x12c/0x2a8
[ 28.452754] [] watchdog_unpark_threads+0x0/0x98
[ 28.452859] [] __hrtimer_run_queues+0xd8/0x360
[ 28.452959] [] hrtimer_interrupt+0xa8/0x1e0
[ 28.453167] [] arch_timer_handler_phys+0x38/0x58
[ 28.453645] [] handle_percpu_devid_irq+0x90/0x2b0
[ 28.454118] [] generic_handle_irq+0x34/0x50
[ 28.454539] [] __handle_domain_irq+0x68/0xc0
[ 28.457167] [] gic_handle_irq+0x5c/0xb0
[ 28.462505] [] el1_irq+0xe8/0x194
[ 28.467150] [] irq_exit+0xd0/0x118
[ 28.471961] [] __handle_domain_irq+0x6c/0xc0
[ 28.478080] [] gic_handle_irq+0x5c/0xb0
[ 28.483161] [] el1_irq+0xe8/0x194
[ 28.488235] [] dma_alloc_from_coherent_attr+0x0/0x168
[ 28.494801] [] tegra_pcie_dw_host_init+0x858/0xb28
[ 28.501359] [] dw_pcie_host_init+0x230/0x530
[ 28.507308] [] tegra_pcie_dw_runtime_resume+0x1bc/0x370
[ 28.514396] [] pm_generic_runtime_resume+0x3c/0x58
[ 28.520697] [] __genpd_runtime_resume+0x38/0xa0
[ 28.527083] [] genpd_runtime_resume+0xa4/0x210
[ 28.533031] [] __rpm_callback+0x74/0xa0
[ 28.538456] [] rpm_callback+0x34/0x98
[ 28.543705] [] rpm_resume+0x470/0x710
[ 28.548780] [] __pm_runtime_resume+0x4c/0x70
[ 28.554644] [] tegra_pcie_dw_probe+0x8d8/0xbb0
[ 28.560428] [] platform_drv_probe+0x60/0xc0
[ 28.566024] [] driver_probe_device+0x298/0x448
[ 28.572143] [] __driver_attach+0xdc/0x128
[ 28.577743] [] bus_for_each_dev+0x5c/0xa8
[ 28.583518] [] driver_attach+0x30/0x40
[ 28.588601] [] bus_add_driver+0x20c/0x2a8
[ 28.594198] [] driver_register+0x6c/0x110
[ 28.600232] [] __platform_driver_register+0x5c/0x68
[ 28.606450] [] tegra_pcie_rp_init+0x18/0x20
[ 28.612221] [] do_one_initcall+0x44/0x130
[ 28.618257] [] kernel_init_freeable+0x1a0/0x244
[ 28.624121] [] kernel_init+0x18/0x108
[ 28.629373] [] ret_from_fork+0x10/0x30
[ 28.635147] SMP: stopping secondary CPUs
[ 28.639087] Kernel Offset: disabled
[ 28.642753] Memory Limit: none

It was working fine when the hard reset is given. We are using PCI MSI interrupt in our driver(pci_alloc_irq_vectors and pci_irq_vector). should I free that while driver removal? Can you please tell us what we are missed. Thanks in advance.

Sorry for the late response, have you managed to get issue resolved? Thanks

Hai Kayccc, No we are not yet fixed. After start-stop the video channel we are not getting any interrupt from MpSoC so there is no issue on there.

Is there any patch that i need to add in PCie?? Please help us to fix it.

Hi,

I think this spurious interrupts caused by PCIe legacy IRQ. In the next boot as soon as controller is initialized maybe endpoint send Assert INTA message because PCIe MSI is not yet enabled. Since endpoint driver is not yet binded, it caused spurious interrupts. In the hard reset case, power is cutoff to the endpoint, so you are not seeing this issue.
Please debug from this point of view and see if endpoint is sending Assert INTA.

From Tegra you can forcefully disable INTX completely since you are using MSI, but this is not recommended solution. You can clear this bit as part of PCIe controller initialization in file: kernel/nvidia/drivers/pci/dwc/pcie-tegra.c, func: tegra_pcie_enable_legacy_interrupts()
val |= APPL_INTR_EN_L1_8_INTX_EN; → remove this line
Note: This change will impact all controllers, if you have other endpoint which are using legacy interrupt then it work. In that case you have add condition logic, based on domain number or DT entry.

Thanks,
Manikanta

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.