PCIe legacy interrupt hangs TX1

mrbmcg · October 31, 2016, 1:59pm

I have a topic Zynq FPGA development board plugged into the TX1. It contains a sample FPGA PCIe project based on the Bus Mastered DMA application note (xapp1052) from Xilinx.

This is a very simple design that involves the configuration of the FPGA via PCIe writes to the BAR area of the FPGA PCIe device then writing a start bit in the configuration space to begin a read/write transfer to/from the X1 to the FPGA.

The transfer is happening correctly but there is an option bit in one of the configuration registers to generate a PCIe interrupt on the completion of the transfer.

If I enable the interrupt generation the whole kernel hangs once the transfer is complete. from debugging with the lauterbach it appears to be a legacy interrupt that is being generated before the whole system hangs.

Has anybody successfully been able to handle legacy PCI interrupts on the TX1?

Cheers
Robert

mrbmcg · November 1, 2016, 8:59am

So made some progress on this but I think there is a kernel issue with the TX1

The pcie interrupt handler in the TX1 kernel (3.10.96) is shown below

static irqreturn_t tegra_pcie_isr(int irq, void *arg)
{
	const char *err_msg[] = {
		"Unknown",
		"AXI slave error",
		"AXI decode error",
		"Target abort",
		"Master abort",
		"Invalid write",
		"",
		"Response decoding error",
		"AXI response decoding error",
		"Transcation timeout",
		"",
		"Slot Clock request change",
		"TMS Clock clamp change",
		"TMS power down",
		"Peer to Peer error",
	};
	struct tegra_pcie *pcie = arg;
	u32 code, signature;

	PR_FUNC_LINE;
	code = afi_readl(pcie, AFI_INTR_CODE) & AFI_INTR_CODE_MASK;
	signature = afi_readl(pcie, AFI_INTR_SIGNATURE);

	if (code == AFI_INTR_LEGACY)
		handle_sb_intr(pcie);
	afi_writel(pcie, 0, AFI_INTR_CODE);

	if (code >= ARRAY_SIZE(err_msg))
		code = 0;

	/*
	 * do not pollute kernel log with master abort reports since they
	 * happen a lot during enumeration
	 */
	if (code == AFI_INTR_MASTER_ABORT)
		pr_debug("PCIE: %s, signature: %08x\n",
				err_msg[code], signature);
	else if ((code != AFI_INTR_LEGACY) && (code != AFI_INTR_PRSNT_SENSE))
		dev_err(pcie->dev, "PCIE: %s, signature: %08x\n",
				err_msg[code], signature);

	return IRQ_HANDLED;
}

When an AFI_INTR_LEGACY interrupt is detected it calls the handle_sb_intr() method to handle sideband messages

static void handle_sb_intr(struct tegra_pcie *pcie)
{
	u32 mesg;

	PR_FUNC_LINE;
	mesg = afi_readl(pcie, AFI_MSG_0);
	printk(KERN_ERR "AFI_MSG 0x%08x\n",mesg);
	if (mesg & AFI_MSG_INTX_MASK)
		/* notify device isr for INTx messages from pcie devices */
		dev_dbg(pcie->dev,
			"Legacy INTx interrupt occurred %x\n", mesg);
	else if (mesg & AFI_MSG_PM_PME_MASK) {
		struct tegra_pcie_port *port, *tmp;
		/* handle PME messages */
		list_for_each_entry_safe(port, tmp, &pcie->ports, list)
			if (port->index == (mesg & AFI_MSG_PM_PME0))
				break;
		mesg = rp_readl(port, NV_PCIE2_RP_RSR);
		mesg |= NV_PCIE2_RP_RSR_PMESTAT;
		rp_writel(port, mesg, NV_PCIE2_RP_RSR);
	} else
		afi_writel(pcie, mesg, AFI_MSG_0);
}

For a legacy interrupt this just outputs a dynamic debug message so it returns to the original handler.

The handler clears the interrupt but always returns IRQ_HANDLED. As far as I am aware this prevents any other registered interrupt handlers for this interrupt being actioned. As a result my handler which has to signal to the FPGA that the transfer is complete is never called thus the FPGA never deasserts the legacy interrupt and it is continually triggered causing the kernel to hang.

I notice in later versions of the Kernel e.g. 4.8 that the isr is as below:

static irqreturn_t tegra_pcie_isr(int irq, void *arg)
{
         const char *err_msg[] = {
                 "Unknown",
                 "AXI slave error",
                 "AXI decode error",
                 "Target abort",
                 "Master abort",
                 "Invalid write",
                 "Legacy interrupt",
                 "Response decoding error",
                 "AXI response decoding error",
                 "Transaction timeout",
                 "Slot present pin change",
                 "Slot clock request change",
                 "TMS clock ramp change",
                 "TMS ready for power down",
                 "Peer2Peer error",
         };
         struct tegra_pcie *pcie = arg;
         u32 code, signature;
 
         code = afi_readl(pcie, AFI_INTR_CODE) & AFI_INTR_CODE_MASK;
         signature = afi_readl(pcie, AFI_INTR_SIGNATURE);
         afi_writel(pcie, 0, AFI_INTR_CODE);
 
         if (code == AFI_INTR_LEGACY)
                 return IRQ_NONE;
 
         if (code >= ARRAY_SIZE(err_msg))
                 code = 0;
 
         /*
          * do not pollute kernel log with master abort reports since they
          * happen a lot during enumeration
          */
         if (code == AFI_INTR_MASTER_ABORT)
                 dev_dbg(pcie->dev, "%s, signature: %08x\n", err_msg[code],
                         signature);
         else
                 dev_err(pcie->dev, "%s, signature: %08x\n", err_msg[code],
                         signature);
 
         if (code == AFI_INTR_TARGET_ABORT || code == AFI_INTR_MASTER_ABORT ||
             code == AFI_INTR_FPCI_DECODE_ERROR) {
                 u32 fpci = afi_readl(pcie, AFI_UPPER_FPCI_ADDRESS) & 0xff;
                 u64 address = (u64)fpci << 32 | (signature & 0xfffffffc);
 
                 if (code == AFI_INTR_MASTER_ABORT)
                         dev_dbg(pcie->dev, "  FPCI address: %10llx\n", address);
                 else
                         dev_err(pcie->dev, "  FPCI address: %10llx\n", address);
         }
 
         return IRQ_HANDLED;
 }

So for legacy interrupts IRQ_NONE is returned immediately after the status is cleared, which should allow the interrupt to propagate to my handler yes?

Any comments by you nvidia guys?

Robert

CHuang1 · November 2, 2016, 2:42pm

"If I enable the interrupt generation the whole kernel hangs once the transfer is complete.

mrbmcg,
Just to confirm which bit/register you enable for this?

mrbmcg · November 2, 2016, 5:10pm

Hi

I’m not enabling them in the tx1, they are enabled by default. There is an option in the configuration registers of my FPGA implementation to send a legacy irq inta when a transfer is complete.

I’ve modified the kernel so that the interrupt handler returns IRQ_NONE instead. My registered handler is called if I do this and I tell the FPGA to deassert the inta and everything works ok. Basically there way the kernel is at the moment you cannot use legacy interrupts because the tegra-pci.c handler just swallows them

vidyas · November 3, 2016, 4:55am

while registering ISR in your driver, do you use IRQF_SHARED? If not, please use it. Legacy interrupt is shared and all of them have to be registered with IRQF_SHARED, that way, interrupt subsystem in kernel will make sure that all ISRs get called and whether or not a particular ISR wants to handle (IRQ_HANDLED) or ignore (IRQ_NONE) is up to it. But, one ISR returning IRQ_HANDLED or IRQ_NONE shouldn’t affect calling other ISR registered for the same interrupt.

mrbmcg · November 3, 2016, 10:04am

Here is how I am installing the irq

if (0 > request_irq(gIrq, &XPCIe_IRQHandler, IRQF_SHARED , gDrvrName, gDev)) {
    printk(KERN_WARNING"%s: Init: Unable to allocate IRQ",gDrvrName);
    return (CRIT_ERR);
  }

My interrupt does not get called

Note that later versions of the tegra-pci.c interrupt handler return IRQ_NONE

vidyas · November 11, 2016, 6:44am

I’ve locally verified and even with PCIe host controller driver (pci-tegra.c) returning IRQ_HANDLED , all other ISRs still got called. I’m not sure why is the behavior different at your side. Anyway, if having IRQ_NONE is working for you, you can continue to use it, and I’ll check more from our side

kayccc · January 18, 2017, 8:59am

Hi Robert,

Is this still an issue in your side?
Or this problem has been clarified and resolved?

Thanks

mrbmcg · January 18, 2017, 9:46am

No, I still need to return IRQ_NONE or my interrupt is not called. I’m currently debugging other PCIe problems introduced by the R24.2.1 kernel.

I will get back to you shortly on this, within a few days hopefully :-)

Topic		Replies	Views
PCI MSI interrupt generation issue in Jetson Jetson Xavier NX pcie , fpga	9	2537	May 30, 2022
PCIE spurious interrupts Jetson TX2 pcie	10	2005	October 18, 2021
Xavier not routing PCI interrupts across PEX8112 bridge Jetson AGX Xavier	25	3834	October 18, 2021
PCIe legacy interrupts not working with L4T 35.1 Jetson AGX Xavier pcie	5	904	February 21, 2023
PCIe MSI interrupt not caught on Xavier kernel Jetson AGX Xavier	7	1796	July 24, 2020
Xavier AGX PCIe configuration Jetson AGX Xavier pcie , kernel	22	3466	April 13, 2022
The PCI device intterupt is routed to IRQ 0 in Jetson Xavier 5.10 kernel Jetson Xavier NX pcie , kernel	2	652	November 17, 2022
No tegra-pcie-msi interrupts on Jetson Xavier Jetson Xavier NX pcie	16	243	September 23, 2025
PCIe silicon bug to immediately hang a Tegra X1 and X2 ? Jetson TX2	3	990	October 18, 2021
PCIe Issue after kernel upgrade Jetson AGX Xavier pcie	1	461	February 20, 2022

PCIe legacy interrupt hangs TX1

Related topics