So I found a post (https://devtalk.nvidia.com/default/topic/982801/tx1-pcie-msi-interrupts-multi-vector-support-/?offset=2) and I have a similar question. I installed Gigabyte 10gbe card (AQC107) on TX2 and it works fine, but I realized all MSI are going to cpu0. /proc/interrupts shows there are four MSI instances for the device, but they are never served by any other cpus than cpu0.
I can’t find a detailed hardware description of TX2 so I was searching something similar and found Xilinx Zynq Ultrascale+ with four ARM Cortex A53 and figured out something interesting. The document (https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf) says 0-31 MSI vectors are coalesced to one GIC interrupt bit.
So what I think is the kernel is virtually generating multiple vectors for the pcie device but in real all the vectors are merged into one interrupt bit at GIC and that’s why only cpu0 is busy. Reading the device tree of the TX2 I assume it is using GICv2 and according to the ARM website (https://developer.arm.com/products/system-ip/system-controllers/interrupt-controllers) it looks like GIC-400 is not supporting MSI SMP affinity.
Does anyone know well enough about TX2 PCIe and MSI? Thanks.
Hardware IRQs can only be serviced by CPU0…there is indeed an aggregator, and that aggregator does not have any kind of I/O APIC.
If some portion of the driver is not hardware dependent, and if that other part is put in a separate driver component for ksoftirqd to manage, then you will have offloaded as much as possible of the hardware-dependent component to other CPU cores. You cannot offload parts needing physical wiring which does not exist to the other cores. This is probably the most important issue when trying to approach RT behavior.
Thanks for your answer! But I have a question about that “Hardware IRQs can only be serviced by CPU0…” because I can change the smp_affinity of the onboard Ethernet to CPU1-5 and actually it is served other than CPU0. For PCIe MSI, I can’t do this. smp_affinity is set to 3f and I cannot change. Do you know why this is happening?
Edit: Nevermind. I think the onboard Ethernet works little different. I see three interrupts ether_qos.common_irq, 2490000.ether_qos.rx0, and 2490000.ether_qos.tx0. I can change the last two interrupts but not the first one. Changing the last two correctly increases the NET_RX and NET_TX of the assigned cores.
The scheduler can migrate processes to different cores. I am going to guess that even though another core is involved with where an attempt as SMP affinity is aimed, that the parts which are hardware-only will migrate back to CPU0. If you know which ksoftirq PID is involved, then you should be able to set this to another core for all use (this may or may not help performance…if cache is suddenly no longer useful you will get a performance drop…but if CPU0’s time is being used up and there is IRQ starvation for other hardware, then migrating ksoftirq to other cores will help overall even though it might hurt that one driver…everything you see under ksoftirq is a candidate for migrating to other cores in order to give CPU0 more time for hardware uses).
Some of the integrated hardware may be wired differently, but I’m not sure where…for example, the memory controller is hardware, but is available to all cores. These are special cases though, and not for hardware which is not integrated.