Innova 2 FPGA PCIE Rescan Not Showing Up

Hello there,
I’ve got an Innova-2 Flex SmartNIC which I’m trying to use as an FPGA development card. So far, I’ve successfully been able to successfully build and program the PCIE example project included with the bundle and can recognize the FPGA within linux. Now I am trying to remove the FPGA as a PCIe endpoint and reenumerate the PCIE bus so I can avoid having to reboot when changing the bitstream.
When I remove XDMA device by running echo 1 > /sys/bus/pci/devices/0000\:03\:00.0/remove, everything functions as I would expect, with the Xilinx Serial Controller device no longer showing up when I do lspci, but when I try an trigger a rescan of the PCIe bus with echo 1 > /sys/bus/pci/rescan without changing the bitstream or doing anything afterwards, it doesn’t show up when I do lspci. This procedure works when I attempt to remove and reenumerate other parts of the card like the Ethernet controllers or the PCIe switches, just not for the FPGA.
So far I have tried removing all PCIe devices associated with the Innova-2 Flex NIC before rescanning and the procedure listing in the user’s manual for disabling and reenabling the PCIe switch to no avail. Only rebooting my computer will cause the FPGA to show up again. Can any of you guys help me diagnose what is going on here and is there any information I can provide that would help?
Thanks,
Ethan

I am able to remove and rescan an XDMA design implemented using Vivado ML 2023.1. Try a different Vivado version.

sudo su
lspci -nnd 10ee:
echo 1 > /sys/bus/pci/devices/0000\:03\:00.0/remove
lspci -nnd 10ee:
echo 1 > /sys/bus/pci/rescan
lspci -nnd 10ee:

innova2_linux_remove_and_rescan

From lspci’s tree view, sudo lspci -tnnv, the FPGA sits on the ConnectX-5 PCIe switch and you can disable this link.

avoid having to reboot when changing the bitstream

The innova2_flex_app communicates with the Flex Image to program the Dual Quad (x8) SPI Flash memory ICs. The Factory, Flex, and User images are stored in this Flash and one is selected at boot. innova2_flex_app is designed to program the initial bitstream.

Look into Design Function eXchange (DFX) [ug909] and Partial Reconfiguration over PCIe. It can use any PCIe block and should allow you to load designs without rebooting. Also AXI HWICAP [pg134].

Unfortunately, Tandem PROM and DFX over PCIe configuration for XDMA is not available as the PCIe and OpenCAPI Transceiver Quads are not in the same column as the X1Y0 PCIE4 block.

Thank you so much, I’ll try reimplement my design in Vivado 2023.1 and see if anything changes. In the mean time, would you mind sharing a project TCL script or flash image so I can try recreate your test setup?

I just tried flashing a bitstream containing a basic XDMA interface hooked up to DRAM built with Vivado 2023.1 and had the same thing happen, where it shows up with lspci when I first boot, but after I try remove and rescan the XDMA interface, it doesn’t come back up. Here’s what happened:

root@dumpsterfiresim:/home/eygao# lspci
<Removed for brevity>
01:00.0 PCI bridge: Mellanox Technologies MT28800 Family [ConnectX-5 PCIe Bridge]
02:08.0 PCI bridge: Mellanox Technologies MT28800 Family [ConnectX-5 PCIe Bridge]
02:10.0 PCI bridge: Mellanox Technologies MT28800 Family [ConnectX-5 PCIe Bridge]
03:00.0 Memory controller: Xilinx Corporation Device 9038
04:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
04:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
06:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 03)
root@dumpsterfiresim:/home/eygao# echo 1 > /sys/bus/pci/devices/0000\:03\:00.0/remove
root@dumpsterfiresim:/home/eygao# lspci
<Removed for brevity>
01:00.0 PCI bridge: Mellanox Technologies MT28800 Family [ConnectX-5 PCIe Bridge]
02:08.0 PCI bridge: Mellanox Technologies MT28800 Family [ConnectX-5 PCIe Bridge]
02:10.0 PCI bridge: Mellanox Technologies MT28800 Family [ConnectX-5 PCIe Bridge]
04:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
04:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
06:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 03)
root@dumpsterfiresim:/home/eygao# echo 1 > /sys/bus/pci/rescan
root@dumpsterfiresim:/home/eygao# lspci
<Removed for brevity>
01:00.0 PCI bridge: Mellanox Technologies MT28800 Family [ConnectX-5 PCIe Bridge]
02:08.0 PCI bridge: Mellanox Technologies MT28800 Family [ConnectX-5 PCIe Bridge]
02:10.0 PCI bridge: Mellanox Technologies MT28800 Family [ConnectX-5 PCIe Bridge]
04:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
04:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
06:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 03)

And here’s the block diagram of the design I used:

Could this be an issue with the computer I’m using and are there any special features are required? So far, I’ve tried this out in two systems, a Dell Optiplex 7060 which has an 8th gen i5 and an Optiplex 9020 which has a 4th gen i5 and both have the same behavior.

Thanks,
Ethan

Your block diagram looks correct. Does it work? Can you access all the DDR memory?

Could this be an issue with the computer I’m using

In the Optiplex 7060 BIOS, disable ASPM Support, disable Computrace, enable the PCIe Slot under Miscellaneous Devices, and if you can find the options, enable Above-4G Memory Decoding and enable Resizable BAR. The 7060 uses the Intel Q370 chipset which supports PCIe Hot-Plug - 24.4.10. The system I use for testing is based on a B360.

I came across this Github Issue which suggests PCIe hot-plug is hit-or-miss.

BTW, I updated my initial answer with a note about Partial DFX Reconfiguration.

I couldn’t find any option for above 4G decoding or Resizable BAR in the BIOS, but I was able to disable ASPM and Computrace. Unfortunately, it looks like I have the same behavior with the FPGA showing up at boot but not after removing and rescanning the bus. I’ll have to try and see if I can find a friend with a more modern computer to see if it’s just my system.
I was successfully able to DMA random data to and from the FPGA and the SHA256 checksums were identical so it looks like XDMA is at least partially working.
Also, thanks for the resources concerning partial reconfiguration, the software I’m trying to port, FireSim, loads full bitstreams and uses this remove and rescan technique, but if this doesn’t work for some reason, I’ll take a look into it.

Alright, I managed to fix my issue though I’m still not entirely sure what was causing it. I was doing a little bit of experimentation with Linux live USBs to see if the linux version/distro had any effect and I found that older distros like Ubuntu 18.04 worked while newer ones like Ubuntu 22.04 or Feodra 38 did not so I installed the original Ubuntu 20.04 release with kernel version 5.4.0-26 and that fixed my issue. Thanks so much for all your help and your github docs, those were invaluable when setting everything up!

Thanks @eygao!

Ubuntu 20.04 release with kernel version 5.4.0-26 and that fixed my issue

Does your Linux kernel support PCI Hotplug?

cat /boot/config-5.8.0-43-generic | grep -i acpi | grep -i pci

CONFIG_ACPI_PCI_SLOT=y
CONFIG_ACPI_APEI_PCIEAER=y
CONFIG_HOTPLUG_PCI_ACPI=y
CONFIG_HOTPLUG_PCI_ACPI_IBM=m

cat /boot/config-5.8.0-43-generic | grep -i hotplug

CONFIG_HOTPLUG_CPU=y
# CONFIG_BOOTPARAM_HOTPLUG_CPU0 is not set
# CONFIG_DEBUG_HOTPLUG_CPU0 is not set
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_ACPI_HOTPLUG_CPU=y
CONFIG_ACPI_HOTPLUG_MEMORY=y
CONFIG_ACPI_HOTPLUG_IOAPIC=y
CONFIG_HOTPLUG_SMT=y
CONFIG_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTPLUG_SPARSE=y
CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y
CONFIG_HOTPLUG_PCI_PCIE=y
CONFIG_HOTPLUG_PCI=y
CONFIG_HOTPLUG_PCI_ACPI=y
CONFIG_HOTPLUG_PCI_ACPI_IBM=m
CONFIG_HOTPLUG_PCI_CPCI=y
CONFIG_HOTPLUG_PCI_CPCI_ZT5550=m
CONFIG_HOTPLUG_PCI_CPCI_GENERIC=m
CONFIG_HOTPLUG_PCI_SHPC=y
CONFIG_XEN_BALLOON_MEMORY_HOTPLUG=y
CONFIG_XEN_BALLOON_MEMORY_HOTPLUG_LIMIT=512
CONFIG_MLXREG_HOTPLUG=m
# CONFIG_CPU_HOTPLUG_STATE_CONTROL is not set

I just tried flashing a bitstream containing a basic XDMA

It is my understanding that the Flex Image implements Post-Configuration Access of SPI Flash Memory using STARTUPE3 (XAPP1280):

STARTUPE3

Verilog_VHDL_and_Xilinx_Design_Constrains.zip’s .xdc contains the following so that both FLASH ICs are accessible:

# secondary flash
set_property PACKAGE_PIN AM12 [get_ports {sec_flash_d[0]}]
set_property PACKAGE_PIN AN12 [get_ports {sec_flash_d[1]}]
set_property PACKAGE_PIN AR13 [get_ports {sec_flash_d[2]}]
set_property PACKAGE_PIN AR12 [get_ports {sec_flash_d[3]}]
set_property PACKAGE_PIN AV11 [get_ports sec_flash_cs]
set_property IOSTANDARD LVCMOS18 [get_ports {sec_flash_d[0]}]
set_property IOSTANDARD LVCMOS18 [get_ports {sec_flash_d[1]}]
set_property IOSTANDARD LVCMOS18 [get_ports {sec_flash_d[2]}]
set_property IOSTANDARD LVCMOS18 [get_ports {sec_flash_d[3]}]
set_property IOSTANDARD LVCMOS18 [get_ports sec_flash_cs]

The Flex Image is designed such that innova2_flex_app can write to the flash memory ICs.

Writing to SPI FLASH is quite slow. It runs at about1MByte/minute. Running the FPGA Burning Flow takes about 10minutes.

For compatibility with firesim you could instead program the Innova-2 over JTAG similar to other boards which takes <1minute. However, the design must basically have the same PCIe configuration or it will not work which is OK for firesim. A cold reboot will reset the User Image.

Program and run your User Image. Then, enable JTAG access and disconnect the FPGA from the ConnectX-5 PCIe Switch.

sudo mst start
cd ~/Innova_2_Flex_Open_18_12/driver/
sudo ./make_device
cd ~
sudo insmod /usr/lib/modules/5.8.0-43-generic/updates/dkms/mlx5_fpga_tools.ko
sudo ~/Innova_2_Flex_Open_18_12/app/innova2_flex_app
echo Enable JTAG Access with Option 3

sudo su
echo 1 > /sys/bus/pci/devices/0000\:03\:00.0/remove
setpci  -s 02:08.0  0x70.w=0x50

On the system with your JTAG debugger, run xsdb:

source /tools/Xilinx/Vivado/2021.1/settings64.sh
xsdb

The after 7000 command is required on the first run after power-up of a Xilinx Platform Cable USB II or clone as its FX2 USB IC SRAM needs to be updated. Program the bitstream:

connect
after 7000
targets
target 1
targets
fpga -state
rst -srst
fpga xdma_wrapper.bit
fpga -state
exit

On the system hosting the Innova-2, reconnect and rescan the PCIe bus:

setpci  -s 02:08.0  0x70.w=0x40
echo 1 > /sys/bus/pci/rescan
lspci | grep "Xilinx\|Mellanox"
exit

sudo modprobe xdma
lspci -vnn -d 10ee:

The Innova-2 should now show up under lspci | grep "Xilinx\|Mellanox"

A tree view should show it is attached to the ConnectX-5 PCIe switch:

sudo lspci -tvnn | grep "0000\|10ee\|15b3"

-[0000:00]-+-00.0  Intel Corporation Device [8086:3e0f]
           +-1d.0-[01-04]----00.0-[02-04]--+-08.0-[03]----00.0  Xilinx Corporation Device [10ee:9038]
           |                               \-10.0-[04]--+-00.0  Mellanox Technologies MT27800 Family [ConnectX-5] [15b3:1017]
           |                                            \-00.1  Mellanox Technologies MT27800 Family [ConnectX-5] [15b3:1017]