Manually modifying smp_affinity

I’m currently working with a multi-port network card that I have plugged into the PCIe port of the Jetson AGX Orin Dev Kit. I’m running an early access of JetPack 7 (R38.1 EA) on this board and have ported the vendor’s drivers to the 6.8.12 kernel implementation used for this JP release.

This network card is capable of 25G+ speeds. In my early testing, I’ve noticed that all interrupts are running through CPU0. The vendor includes a tuning script that expects me to be able to modify smp_affinity (e.g. echo fff | sudo tee /proc/irq/278/smp_affinity). This fails. After some searching here on the forums, I discovered this post: [Jetson AGX Orin] Intel Network Card RX Interrupts Locked to CPU0 Despite Affinity Settings - #2 by WayneWWW

I manually patched what @delwyn provided to 6.8.12, but that similarly fails for me. I’ve noticed that dmesg reports “GICv2m: Failed to allocate v2m resource.” early in the boot process, so I apparently have some further work to.

I’m wondering if anyone else has ported this same patch to Linux 6.8.12 successfully? If so, have you seen any performance improvements? I’m hoping that might save me some time and effort. @WayneWWW has stated in the above thread that this is a “will not fix” issue, so it looks like it’s going to fall on the users to address this regression.

Hi DynamicDolphin,

Sounds like you haven’t applied the devicetree changes from the patch.

Regards,

Delwyn

That’d be my guess, but from all appearances they are there. I’m double-checking things right now.

@delwyn Thanks for the tip! I had #address-cells and #size-cells misconfigured. Just rebooted things with those corrections and I’m a little further.

The current issue that I’m working through now is that with the patch applied, the network card driver seems to be inducing an IOMMU fault. I see this during the driver’s probe:

[10111.047910] arm-smmu 12000000.iommu: Unhandled context fault: fsr=0x402, iova=0x0f410040, fsynr=0x490011, cbfrsynra=0x1014, cb=14
[10111.060030] tegra-mc 2c00000.memory-controller: pcie5w: secure write @0x00000003ffffff00: VPR violation ((null))

My next effort is to see what the driver is doing that is causing this.

Any ideas what this error is actually indicating? The “iova” value in the fault message is in the same address range as the new gic_v2m device tree entry:

			gic_v2m: v2m@f410000 {
				compatible = "arm,gic-v2m-frame";
				msi-controller;
				#msi-cells = <1>;
				reg = <0x0 0x0f410000 0x0 0x00010000	/* GICA */
				       0x0 0x54000000 0x0 0x04000000>;
				reg-names = "gic_base", "msi_base";
				arm,msi-base-spi = <GIC_SPI_MSI_BASE>;
				arm,msi-num-spis = <GIC_SPI_MSI_SIZE>;
			};

I’m guessing the network card driver is trying to access this area and the fault is simply indicating that the driver doesn’t have the proper access level.

Looking further upstream in the dmesg output, I see this entry:

[169994.992256] pci 0005:00:00.0: of_irq_parse_pci: failed with rc=-22

This occurs when I try to modprobe my network card driver.

I did not see this appear in the kernel log before I applied the patch from @delwyn. Not sure if it is a latent failure or not.

Maybe I need to take a step back here. @WayneWWW is there a way to manage CPU affinity using GICv3 without resorting to patching GICv2m back into the kernel for JP7?

What I’m seeing with my high-speed network card is that under iperf3 conditions, CPU0 is railing at 100% utilization. I haven’t confirmed it, but I’m wondering if this is also why I’m seeing a huge amount of “retr” counts during the iperf3 test when I would expect them to be at or near 0. If I can distribute things across CPUs (which is the intent of the GICv2m patch) hopefully I can get the performance I’m expecting.

Thanks.

Modified these from Thor to Orin, but change to your Orin’s network devices.

Receive Packet Steering. This allows CPU0 (the one that physically receives the interrupt) to hand off the packet processing to other idle cores in the system.

# Source: https://enterprise-support.nvidia.com/s/article/receive-packet-steering
# Following creates a bitmask (fff = cores 0-11) to distribute load.

for x in /sys/class/net/eno1/queues/rx-*/rps_cpus; do echo "fff" | sudo tee $x; done

Spread interrupts for nic interfaces. Rather than keep primarily pinned on one cpu or vm.

for dev in eno1; do
    IRQS=$(grep $dev /proc/interrupts | awk '{print $1}' | tr -d ':')
    CORE=0
    for irq in $IRQS; do
        echo $CORE | sudo tee /proc/irq/$irq/smp_affinity_list
        CORE=$((CORE + 1)) # This moves each queue to a new core
    done
done

This doesn’t work for me. While I can update the /sys/class/net/*/rps_cpus settings, I cannot change the smp_affinity_list at all:

localrom@kirk:~/projects/chelsio-uwire$ ./scripts/tune.sh
fff
fff
fff
fff
fff
fff
fff
fff
0
tee: /proc/irq/278/smp_affinity_list: Invalid argument
1
tee: /proc/irq/279/smp_affinity_list: Invalid argument
2
tee: /proc/irq/280/smp_affinity_list: Invalid argument
3
tee: /proc/irq/281/smp_affinity_list: Invalid argument
4
tee: /proc/irq/282/smp_affinity_list: Invalid argument
5
tee: /proc/irq/283/smp_affinity_list: Invalid argument
6
tee: /proc/irq/284/smp_affinity_list: Invalid argument
7
tee: /proc/irq/285/smp_affinity_list: Invalid argument

Also gave things a manual try:

localrom@kirk:~/projects/chelsio-uwire$ echo 1 | sudo tee /proc/irq/278/smp_affinity_list
1
tee: /proc/irq/278/smp_affinity_list: Invalid argument
localrom@kirk:~/projects/chelsio-uwire$ cat /proc/irq/278/smp_affinity_list

Running a quick iperf3 test after executing this script shows everything still lands on CPU0.

Thanks for the response all the same!

For completeness, here’s a snapshot of the pertinent /proc/interrupts section. This is after doing both a default and --reverse using iperf3:

278:      89484          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI 671621130 Edge      enP5p1s0f4d1 (queue 0)
279:     428128          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI 671621131 Edge      enP5p1s0f4d1 (queue 1)
280:     116403          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI 671621132 Edge      enP5p1s0f4d1 (queue 2)
281:     184386          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI 671621133 Edge      enP5p1s0f4d1 (queue 3)
282:     233743          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI 671621134 Edge      enP5p1s0f4d1 (queue 4)
283:     152534          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI 671621135 Edge      enP5p1s0f4d1 (queue 5)
284:     239339          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI 671621136 Edge      enP5p1s0f4d1 (queue 6)
285:      60721          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI 671621137 Edge      enP5p1s0f4d1 (queue 7)

How about:

dev=eno1
start_cpu=2
ncpus=$(nproc)
core=$start_cpu

IRQS=$(grep -E "${dev}\.vm[0-9]+" /proc/interrupts | awk '{print $1}' | tr -d ':')

for irq in $IRQS; do
  cpu=$(( core % ncpus ))
  echo "$cpu" | sudo tee "/proc/irq/$irq/smp_affinity_list" >/dev/null
  core=$((core + 1))
done

See that it made the changes.

for irq in $IRQS; do
  echo -n "$irq: "
  cat /proc/irq/$irq/smp_affinity_list
done

for irq in $IRQS; do
  echo -n "$irq: "
  cat /proc/irq/$irq/smp_affinity
done

Confirm traffic is actually hitting the cpu’s.

watch -n1 "grep -E 'eno1\.vm[0-9]+' /proc/interrupts"

Unfortunately gic-v2m is the only way to spread PCIe MSI interrupts across all the CPU cores on Orin. Otherwise PCIe MSI interrupts are handled by the DesignWare PCIe host bridge core, which coalesces all interrupts into one SPI.

The of_irq_parse_pci error suggests problems in your device tree associated with the PCIe host bridge node. I would double check that the patch is correctly applied.

Someone else successfully ported to Linux 6.6:

Delwyn

@delwyn,

Thanks for the link and the information. I really appreciate it!

I plan to check out the 6.6 patch and see where that leads.

I attempted to implement the 6.6 patch on 6.8 and encounter the following, which is similar in nature to the results I got when porting the original patch by @delwyn :

tegra-mc 2c00000.memory-controller: pcie5w: write @0x000000000f410040: EMEM address decode error (EMEM decode error)

This occurs when my PCIe network driver performs a writel() to the gic-v2m from all appearances.

If the patch is implemented correctly, shouldn’t I see the GICv2m addresses in /proc/iomem? Here’s what I do see in that area:

... snip ...

0e190000-0e19ffff : e100000.tegra_mce tegra_mce@e100000
0e1a0000-0e1affff : e100000.tegra_mce tegra_mce@e100000
0e1b0000-0e1bffff : e100000.tegra_mce tegra_mce@e100000
0f400000-0f40ffff : GICD
0f440000-0f63ffff : GICR
10000000-10ffffff : 10000000.iommu iommu@10000000
11000000-11ffffff : 12000000.iommu iommu@12000000
12000000-12ffffff : 12000000.iommu iommu@12000000

... snip ...

40070000-40070fff : 40000000.sram sram@70000
40071000-40071fff : 40000000.sram sram@71000
80000000-fffdffff : System RAM

... snip ...

There’s nothing defined at 0xf410000 or at 0x54000000 like it lists in the patched device tree:

			gic_v2m: v2m@f410000 {
				compatible = "arm,gic-v2m-frame";
				msi-controller;
				#msi-cells = <1>;
				reg = <0x0 0x0f410000 0x0 0x00010000	/* GICA */
				       0x0 0x54000000 0x0 0x04000000>;
				reg-names = "gic_base", "msi_base";
				arm,msi-base-spi = <GIC_SPI_MSI_BASE>;
				arm,msi-num-spis = <GIC_SPI_MSI_SIZE>;
			};

I know that GICv2m is being initialized:

localrom@kirk:~$ sudo dmesg | grep -i gic
[    0.000000] CPU features: detected: GIC system register CPU interface
[    0.000000] GICv3: GIC: Using split EOI/Deactivate mode
[    0.000000] GICv3: 960 SPIs implemented
[    0.000000] GICv3: 0 Extended SPIs implemented
[    0.000000] Root IRQ handler: gic_handle_irq
[    0.000000] GICv3: GICv3 features: 16 PPIs
[    0.000000] GICv3: CPU0: found redistributor 0 region 0:0x000000000f440000
[    0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:608, num:352)
[    0.000000] GICv2m: Tegra MSI region [mem 0x54000000-0x57ffffff]
[    0.000000] GICv2m: range[mem 0x0f410000-0x0f41ffff], SPI[608:959]
[    0.010731] GICv3: CPU1: found redistributor 100 region 0:0x000000000f460000
[    0.011569] GICv3: CPU2: found redistributor 200 region 0:0x000000000f480000
[    0.012248] GICv3: CPU3: found redistributor 300 region 0:0x000000000f4a0000
[    0.014989] GICv3: CPU4: found redistributor 10000 region 0:0x000000000f4c0000
[    0.015717] GICv3: CPU5: found redistributor 10100 region 0:0x000000000f4e0000
[    0.016482] GICv3: CPU6: found redistributor 10200 region 0:0x000000000f500000
[    0.017205] GICv3: CPU7: found redistributor 10300 region 0:0x000000000f520000
[    0.019969] GICv3: CPU8: found redistributor 20000 region 0:0x000000000f540000
[    0.020730] GICv3: CPU9: found redistributor 20100 region 0:0x000000000f560000
[    0.021446] GICv3: CPU10: found redistributor 20200 region 0:0x000000000f580000
[    0.022154] GICv3: CPU11: found redistributor 20300 region 0:0x000000000f5a0000
[    2.809156] kvm [1]: GICv3: no GICV resource entry
[    2.814020] kvm [1]: disabling GICv2 emulation
[    2.818525] kvm [1]: GIC system register CPU interface enabled
[    2.824419] kvm [1]: vgic interrupt IRQ9
[   12.864152] gic 2a41000.interrupt-controller: GIC IRQ controller registered

Hi,

You should also see this in your dmesg:

[ 7.147450] tegra194-pcie 14100000.pcie: Using GICv2m MSI allocator
[ 7.270603] tegra194-pcie 14160000.pcie: Using GICv2m MSI allocator
[ 7.396913] tegra194-pcie 141a0000.pcie: Using GICv2m MSI allocator

Have you rebuilt your initrd with the patched tegra194-pcie driver?

/proc/iomem on my kernel (5.15.136-tegra) doesn’t contain the GICD or GICR entries either:

.. snip ..

0e190000-0e19ffff : e100000.tegra_mce tegra_mce@e100000
0e1a0000-0e1affff : e100000.tegra_mce tegra_mce@e100000
0e1b0000-0e1bffff : e100000.tegra_mce tegra_mce@e100000
10000000-10ffffff : 10000000.iommu iommu@10000000
11000000-11ffffff : 12000000.iommu iommu@12000000
12000000-12ffffff : 12000000.iommu iommu@12000000

.. snip ..

If the problem isn’t simply the PCIe driver, it might be worth investigating how the GICD and GICR entries end up in /proc/iomem on your kernel. Perhaps some additional step is needed in gicv2m for GICA.

Regards,

Delwyn

Great point on initrd! I haven’t deployed an updated initrd to the dev kit. I’ve been using rmmod and modprobe on pcie_tegra194. When I do that, I do see the Using GICv2m MSI allocator message. I’ll rebuild and deploy initrd to the target and see if that changes anything.

Yeah, I’m wondering what additional steps (if any) might be needed. That’s today’s investigation.

Again, I very much appreciate your help!

Well, I believe I figured out the issue. I made one mistake in manually applying the patch file. After doing a before / after compare, I saw the problem. The /proc/interrupts entry no longer shows that things are only running on CPU0:

279:          0          0          0          0          0          0          0          0      28915          0          0          0       MSI 671621130 Edge      enP5p1s0f4d1 (queue 0)
280:          0          0          0          0          0          0          0          0          0      70891          0          0       MSI 671621131 Edge      enP5p1s0f4d1 (queue 1)
281:          0          0          0          0          0          0          0          0          0          0     119648          0       MSI 671621132 Edge      enP5p1s0f4d1 (queue 2)
282:          0          0          0          0          0          0          0          0          0          0          0      86504       MSI 671621133 Edge      enP5p1s0f4d1 (queue 3)
283:     370551          0          0          0          0          0          0          0          0          0          0          0       MSI 671621134 Edge      enP5p1s0f4d1 (queue 4)
284:          0      25860          0          0          0          0          0          0          0          0          0          0       MSI 671621135 Edge      enP5p1s0f4d1 (queue 5)
285:          0          0      85507          0          0          0          0          0          0          0          0          0       MSI 671621136 Edge      enP5p1s0f4d1 (queue 6)
286:          0          0          0      45956          0          0          0          0          0          0          0          0       MSI 671621137 Edge      enP5p1s0f4d1 (queue 7)

Thanks again, @delwyn for the assistance!

No problem, glad you got it working in the end!

Delwyn