Controlling IRQ Affinity How to distribute IRQs on a NUMA machine with 8 GPUs?

Does anyone have experience setting IRQ affinity masks for NVIDIA GPUs? I have a NUMA system with two NUMA nodes of 6 cores each. Each NUMA node is connected to 4 GTX-470 GPUs. Of course, CPUs in one node can talk to GPUs in the other, but the bandwidth is less.

I’d like to configure the IRQ affinity masks such that interrupts (IRQs) from the GPUs in a particular NUMA node are delivered to CPUs in the same node.

Example:

NUMA Node #1 has CPUs 0…5 and GPUs 0…3

NUMA Node #2 has CPUs 6…11 and GPUs 4…7

I want an interrupt raised by GPU 0 to be delivered to CPUs 0, 1, 2, 3, 4, or 5.

Generally speaking, any interrupts raised by GPUs 0…3 should be delivered to CPUs 0…5. Any interrupts raised by GPUs 4…7 should be delivered to CPUs 6…11.

Here’s a basic primer on how to set affinity masks.

I can set CPU affinity masks in Linux by setting the proper bits to /proc/irq/<IRQ#>/smp_affinity. The trick is how to determine the proper IRQ#s and mapping those to GPUs. On my 8 GPU system, /proc/interrupts shows:

CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7       CPU8       CPU9       CPU10      CPU11      

  24:        680         50         79         76        465          0          0          0         28          0          0          0   IO-APIC-fasteoi   nvidia, nvidia

  30:        492         84         62         52          0          0          0          0         17        422          0          0   IO-APIC-fasteoi   nvidia, nvidia

  48:        630         99         68        480          0          5          0          0         18          0          0          0   IO-APIC-fasteoi   nvidia, nvidia

  54:        673         34         83         72          7          0         22          0        453          0          0          0   IO-APIC-fasteoi   nvidia, nvidia

Okay, so my 8 GPUs are using IRQs 24, 30, 48, and 54. Now how do I know what GPUs are using which IRQ lines? Does each GPU use more than one IRQ line? Is there a way that I can tell the NVIDIA driver to disconnect GPUs selectively (and thus determine IRQs by elimination)? (Also, is there a way to prevent the devices from sharing IRQ lines?)

Thanks!

I’ve discovered the needed information in /proc/driver/nvidia/gpus/<#>/information

The proc file lists which IRQs are used by the various GPUs. Each GPU only uses one IRQ (good). Unfortunately, there is some sharing of the IRQ lines, but it looks like that might be a limitation of PCI (there might be a limit on the number of IRQ lines). Does anyone know?

Anyway, on my platform:

IRQ 24 = GPUs 0 and 1

IRQ 30 = GPUs 2 and 3

IRQ 48 = GPUs 4 and 5

IRQ 54 = GPUs 6 and 7

So, to distribute the IRQs as I described in my original post:

echo 03f > /proc/irq/24/smp_affinity

echo 03f > /proc/irq/30/smp_affinity

echo fc0 > /proc/irq/48/smp_affinity

echo fc0 > /proc/irq/54/smp_affinity

Yes, it appears that the IRQ sharing is a limitation of PCI hardware. However, by extracting the driver code, making a small edit, and compiling the driver module manually, the GPUs can be configured to use MSI.

/proc/interrupts now looks like:

CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7       CPU8       CPU9       CPU10      CPU11      

 101:         13          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      nvidia

 102:         17          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      nvidia

 103:         15          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      nvidia

 104:         11          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      nvidia

 105:          9          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      nvidia

 106:          6          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      nvidia

 107:          5          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      nvidia

 108:          9          0          0          0          0          0          0          0          0          0          0          0   PCI-MSI-edge      nvidia

No more IRQ sharing!!! I wonder how MSI will play with the APIC. I don’t know much about MSI. Will the APIC be bypassed?

Steps:

  1. Boot into the kernel where you want to install your module.

  2. Download the driver. I am using devdriver_4.0_linux_64_270.41.19.run in this example.

  3. Do “devdriver_4.0_linux_64_270.41.19 -x”

  4. Do “cd NVIDIA-Linux-x86_64-270.41.19/kernel”

  5. Edit the file nv-reg.h

  6. Find the line “NV_DEFINE_REG_ENTRY(__NV_ENABLE_MSI, 0);” and change it to “NV_DEFINE_REG_ENTRY(__NV_ENABLE_MSI, 1);” (Change the 0 to a 1.) Save your changes.

  7. Do “make module”

  8. Do “make install” (You may need to exit X11 for this to work.)