PCIe legacy interrupt hangs TX1

I have a topic Zynq FPGA development board plugged into the TX1. It contains a sample FPGA PCIe project based on the Bus Mastered DMA application note (xapp1052) from Xilinx.

This is a very simple design that involves the configuration of the FPGA via PCIe writes to the BAR area of the FPGA PCIe device then writing a start bit in the configuration space to begin a read/write transfer to/from the X1 to the FPGA.

The transfer is happening correctly but there is an option bit in one of the configuration registers to generate a PCIe interrupt on the completion of the transfer.

If I enable the interrupt generation the whole kernel hangs once the transfer is complete. from debugging with the lauterbach it appears to be a legacy interrupt that is being generated before the whole system hangs.

Has anybody successfully been able to handle legacy PCI interrupts on the TX1?

Cheers
Robert

So made some progress on this but I think there is a kernel issue with the TX1

The pcie interrupt handler in the TX1 kernel (3.10.96) is shown below

static irqreturn_t tegra_pcie_isr(int irq, void *arg)
{
	const char *err_msg[] = {
		"Unknown",
		"AXI slave error",
		"AXI decode error",
		"Target abort",
		"Master abort",
		"Invalid write",
		"",
		"Response decoding error",
		"AXI response decoding error",
		"Transcation timeout",
		"",
		"Slot Clock request change",
		"TMS Clock clamp change",
		"TMS power down",
		"Peer to Peer error",
	};
	struct tegra_pcie *pcie = arg;
	u32 code, signature;

	PR_FUNC_LINE;
	code = afi_readl(pcie, AFI_INTR_CODE) & AFI_INTR_CODE_MASK;
	signature = afi_readl(pcie, AFI_INTR_SIGNATURE);

	if (code == AFI_INTR_LEGACY)
		handle_sb_intr(pcie);
	afi_writel(pcie, 0, AFI_INTR_CODE);

	if (code >= ARRAY_SIZE(err_msg))
		code = 0;

	/*
	 * do not pollute kernel log with master abort reports since they
	 * happen a lot during enumeration
	 */
	if (code == AFI_INTR_MASTER_ABORT)
		pr_debug("PCIE: %s, signature: %08x\n",
				err_msg[code], signature);
	else if ((code != AFI_INTR_LEGACY) && (code != AFI_INTR_PRSNT_SENSE))
		dev_err(pcie->dev, "PCIE: %s, signature: %08x\n",
				err_msg[code], signature);

	return IRQ_HANDLED;
}

When an AFI_INTR_LEGACY interrupt is detected it calls the handle_sb_intr() method to handle sideband messages

static void handle_sb_intr(struct tegra_pcie *pcie)
{
	u32 mesg;

	PR_FUNC_LINE;
	mesg = afi_readl(pcie, AFI_MSG_0);
	printk(KERN_ERR "AFI_MSG 0x%08x\n",mesg);
	if (mesg & AFI_MSG_INTX_MASK)
		/* notify device isr for INTx messages from pcie devices */
		dev_dbg(pcie->dev,
			"Legacy INTx interrupt occurred %x\n", mesg);
	else if (mesg & AFI_MSG_PM_PME_MASK) {
		struct tegra_pcie_port *port, *tmp;
		/* handle PME messages */
		list_for_each_entry_safe(port, tmp, &pcie->ports, list)
			if (port->index == (mesg & AFI_MSG_PM_PME0))
				break;
		mesg = rp_readl(port, NV_PCIE2_RP_RSR);
		mesg |= NV_PCIE2_RP_RSR_PMESTAT;
		rp_writel(port, mesg, NV_PCIE2_RP_RSR);
	} else
		afi_writel(pcie, mesg, AFI_MSG_0);
}

For a legacy interrupt this just outputs a dynamic debug message so it returns to the original handler.

The handler clears the interrupt but always returns IRQ_HANDLED. As far as I am aware this prevents any other registered interrupt handlers for this interrupt being actioned. As a result my handler which has to signal to the FPGA that the transfer is complete is never called thus the FPGA never deasserts the legacy interrupt and it is continually triggered causing the kernel to hang.

I notice in later versions of the Kernel e.g. 4.8 that the isr is as below:

static irqreturn_t tegra_pcie_isr(int irq, void *arg)
{
         const char *err_msg[] = {
                 "Unknown",
                 "AXI slave error",
                 "AXI decode error",
                 "Target abort",
                 "Master abort",
                 "Invalid write",
                 "Legacy interrupt",
                 "Response decoding error",
                 "AXI response decoding error",
                 "Transaction timeout",
                 "Slot present pin change",
                 "Slot clock request change",
                 "TMS clock ramp change",
                 "TMS ready for power down",
                 "Peer2Peer error",
         };
         struct tegra_pcie *pcie = arg;
         u32 code, signature;
 
         code = afi_readl(pcie, AFI_INTR_CODE) & AFI_INTR_CODE_MASK;
         signature = afi_readl(pcie, AFI_INTR_SIGNATURE);
         afi_writel(pcie, 0, AFI_INTR_CODE);
 
         if (code == AFI_INTR_LEGACY)
                 return IRQ_NONE;
 
         if (code >= ARRAY_SIZE(err_msg))
                 code = 0;
 
         /*
          * do not pollute kernel log with master abort reports since they
          * happen a lot during enumeration
          */
         if (code == AFI_INTR_MASTER_ABORT)
                 dev_dbg(pcie->dev, "%s, signature: %08x\n", err_msg[code],
                         signature);
         else
                 dev_err(pcie->dev, "%s, signature: %08x\n", err_msg[code],
                         signature);
 
         if (code == AFI_INTR_TARGET_ABORT || code == AFI_INTR_MASTER_ABORT ||
             code == AFI_INTR_FPCI_DECODE_ERROR) {
                 u32 fpci = afi_readl(pcie, AFI_UPPER_FPCI_ADDRESS) & 0xff;
                 u64 address = (u64)fpci << 32 | (signature & 0xfffffffc);
 
                 if (code == AFI_INTR_MASTER_ABORT)
                         dev_dbg(pcie->dev, "  FPCI address: %10llx\n", address);
                 else
                         dev_err(pcie->dev, "  FPCI address: %10llx\n", address);
         }
 
         return IRQ_HANDLED;
 }

So for legacy interrupts IRQ_NONE is returned immediately after the status is cleared, which should allow the interrupt to propagate to my handler yes?

Any comments by you nvidia guys?

Robert

"If I enable the interrupt generation the whole kernel hangs once the transfer is complete.

mrbmcg,
Just to confirm which bit/register you enable for this?

Hi

I’m not enabling them in the tx1, they are enabled by default. There is an option in the configuration registers of my FPGA implementation to send a legacy irq inta when a transfer is complete.

I’ve modified the kernel so that the interrupt handler returns IRQ_NONE instead. My registered handler is called if I do this and I tell the FPGA to deassert the inta and everything works ok. Basically there way the kernel is at the moment you cannot use legacy interrupts because the tegra-pci.c handler just swallows them

while registering ISR in your driver, do you use IRQF_SHARED? If not, please use it. Legacy interrupt is shared and all of them have to be registered with IRQF_SHARED, that way, interrupt subsystem in kernel will make sure that all ISRs get called and whether or not a particular ISR wants to handle (IRQ_HANDLED) or ignore (IRQ_NONE) is up to it. But, one ISR returning IRQ_HANDLED or IRQ_NONE shouldn’t affect calling other ISR registered for the same interrupt.

Here is how I am installing the irq

if (0 > request_irq(gIrq, &XPCIe_IRQHandler, IRQF_SHARED , gDrvrName, gDev)) {
    printk(KERN_WARNING"%s: Init: Unable to allocate IRQ",gDrvrName);
    return (CRIT_ERR);
  }

My interrupt does not get called

Note that later versions of the tegra-pci.c interrupt handler return IRQ_NONE

I’ve locally verified and even with PCIe host controller driver (pci-tegra.c) returning IRQ_HANDLED , all other ISRs still got called. I’m not sure why is the behavior different at your side. Anyway, if having IRQ_NONE is working for you, you can continue to use it, and I’ll check more from our side

Hi Robert,

Is this still an issue in your side?
Or this problem has been clarified and resolved?

Thanks

No, I still need to return IRQ_NONE or my interrupt is not called. I’m currently debugging other PCIe problems introduced by the R24.2.1 kernel.

I will get back to you shortly on this, within a few days hopefully :-)