PCIe MSI interrupt not caught on Xavier kernel

Hi folks,

I’m putting together an FPGA PCIe card and doing some prototyping by placing it into the main PCIe slot in the AGX Xavier carrier board. I have a simple driver that registers an MSI interrupt to a simple handler that just prints text to dmesg and returns. The FPGA by itself triggers an interrupt once per second.

My issue is that something (presumably) in the kernel is not catching those interrupts and executing my driver. I know this is an issue with something in the Xavier because if I run the same driver with the same FPGA card on a different linux machine, I see my interrupt-indicating dmesg text showing up once a second.

Is there some configuration that I’m missing? Appreciate any insight.

I’ve included the relevant part results of “lspci -vvv” and my pci probe function for reference:

0005:01:00.0 Unassigned class [ff00]: Pronto.ai Device ca77
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 819
	Region 0: Memory at 1f40000000 (32-bit, non-prefetchable) 
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [70] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 00000000fffff000  Data: 0000
	Capabilities: [90] Express (v1) Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x4, ASPM L1, Exit Latency L0s unlimited, L1 unlimited
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
	Capabilities: [100 v1] Device Serial Number 00-00-00-00-00-00-00-00
	Kernel driver in use: pcd
static int pcd_probe(struct pci_dev* dev, const struct pci_device_id* id)
{
  int result;
  printk(KERN_NOTICE "Probing pci devices\n");

  // Enable PCI device
  result = pci_enable_device(dev);
  if(result)
  {
    result = -EINVAL;
    goto out;
  }
  
  // Enable bus mastering
  pci_set_master(dev);

  // Request MSI interrupt vector
  // magic number params are min and max vector num
  result = pci_alloc_irq_vectors(dev, 1, 1, PCI_IRQ_MSI);
  if (result < 0)
  {
    printk(KERN_ERR "failed to alloc irq vectors\n");
    goto out_disable;
  }
  else
  {
    printk(KERN_NOTICE "allocated %d vectors\n", result);
    result = 0;
  }

  // Request interrupt assingnment
  // pcd_irq is a global variable
  // pcd_int_handler just prints stuff to dmesg and returns
  pcd_irq = pci_irq_vector(dev, 0);
  printk(KERN_INFO "pcd: interrupt line 0 is %d\n", pcd_irq);
  if (request_irq(pcd_irq, pcd_int_handler, 0, "pcd", dev)) 
  {
    printk(KERN_ERR "pcd: cannot register irq %d\n", pcd_irq);
    pci_free_irq_vectors(dev);
    result = -EIO;
    goto out_disable;
  }

  // Map bar 0
  // Arguments are device, BAR number, maxlen
  pcds[0].bar0 = pci_iomap(dev, 0, 0);

  out:
  return result;

  out_disable:
  pci_disable_device(dev);
  goto out;
}

Thank you!

I don’t see any reason why interrupts are not received by Xavier.
BTW, is all the stuff written in your driver is all that is needed? or are you doing anything extra apart from the driver?
Also, calls to different APIs in your driver are not failing, right?
what does the interrupt count in “/proc/interrupts” show for your handler?

Hi vidyas,

There is some other driver stuff but those functions implement char driver read/write operations that perform iowrite32() calls to various addresses in BAR0 to interface with the device. Calling iowrite32() or ioread32() works just fine; the device receives the memory writes and can issue memory read completions. Bus mastering works too, the device is able to DMA to allocated buffers on the Xavier when requested. I can post the full code if need be, but all code related to initialization and interrupt stuff all happens in the function I posted.

The interrupt handler is this:

static irqreturn_t pcd_int_handler(int irq, void *dev)
{
  printk(KERN_INFO "pcd: interrupt occured!\n");
  return IRQ_HANDLED;
}

All the API calls during initialization are succeeding - the initialization dmesg results look like:

[   45.855687] Probing pci devices
[   45.855782] pcd 0005:01:00.0: enabling device (0000 -> 0002)
[   45.856020] allocated 1 vectors
[   45.856026] pcd: interrupt line 0 is 51
[   45.856381] bar 0 44f80000

Meanwhile, the interrupt count in /proc/interrupts shows nothing. I get:

51:          0          0          0          0          0          0          0          0     GICv2  145 Level     pcd

This is very weird. I’m just wondering where are the upstream writes with the MSI target address going? I mean, if I assume that the address to which your device is generating upstream writes to raise MSI is wrong (for time being), we should be able to observe at least SMMU faults in the console as SMMU catches access to such regions and raises a red flag.
Not sure how much is this going to help but can you take any upstreamed driver and follow the same flow including the APIs used and see if this issue is still reproduced? I’m somehow doubting the whole flow and APIs used.

Hi Vidyas,

I know that if I ask the FPGA to write to some arbitrary address, the kernel does complain depending on the memory space. If I ask it to write 0x0000 to 0xfffff000 (as specified in lspci), nothing happens, though I would expect an interrupt. If I ask it to write to an address that I allocated as DMA space using dma_alloc_coherent(), I can see the data that the FPGA writes showing up, so DMA and bus mastering do seem to work.

Could you please clarify what you mean by an upstreamed driver? Do you mean someone else’s open source gateware for the FPGA?

Yeah… I meant taking any upstreamed driver and modifying it to get your job done. The reason being, we don’t observe any issue with any of the off-the-shelf devices and their respective device driver in the kernel. So, it has to be something that this device is doing specifically.
BTW, is this tested on any other system (like x86) and found it to be working there?

Yes, that is my point of confusion. This same device works and generates interrupts on an x86 system running the same kernel driver just fine.

I’ll try and see if there are any open source FPGA gatewares that I can find that would be compatible.

Hi Vidyas,

Unfortunately I could not find any open-source implementations for this pcie gateware as it is proprietary. However, I found a lead for where the MSI interrupt is going. I have my pcie device issue an MSI interrupt on line 0 once every three seconds, and I observe that when it does so, interrupt line 39 in /proc/interrupts increments. The interrupt is registered to “tegra-pcie-msi”, and can be seen here:

anton@xavier-0:~# cat /proc/interrupts | grep tegra-pcie-msi
  33:          0          0          0          0          0          0          0          0     GICv2  105 Level     tegra-pcie-msi
  35:          0          0          0          0          0          0          0          0     GICv2   78 Level     tegra-pcie-msi
  37:          0          0          0          0          0          0          0          0     GICv2   82 Level     tegra-pcie-msi
  39:        879          0          0          0          0          0          0          0     GICv2   86 Level     tegra-pcie-msi

The interrupt had been triggered 879 times when this command was called.

Do you have any insight into how to properly map this interrupt to the kernel driver from the lower-level handler tegra-pcie-msi?

Thank you kindly,
Anton