IRQ Balancing

It depends what is running on CPU0. Anything requiring hardware access will have to remain there, but often the CPU0 driver will spawn side work to ksoftirqd. This can then migrate to other cores, while the hardware dependent code cannot. Having 100% on CPU0 is not wrong unless it is from processes which could be elsewhere.

@sumitg mentioned “/proc/interrupts”. You’ll notice that these are hardware. If you look at the name in the right most column these are actual hardware devices, often named with a physical address if there are multiple copies of a controller. For example, each i2c controller is listed, and the address is part of the prefix. Every core has timers directly wired, and so interrupts will be seen on each core for timers. Other than this pretty much every IRQ count given is under CPU0. If this were an Intel CPU on a PC, then you’d see this distributed more because of the programmable IO-APIC.

What you really need, but what I do not have enough knowledge for, is to profile time spent under CPU0 hardware IRQs to know what is using the most time. Then to look at whether the time consuming IRQs really need to do all of their work under CPU0. My guess is that drivers for most of the hardware is already very highly tuned, and that only custom drivers might need to be optimized to offload some of the work to ksoftirqd instead of doing it all on CPU0.

The work which is on CPU0 which is not from a hardware IRQ will not be listed in /proc/interrupts. All of this work could be moved to other cores to reduce what hardware i/o must do, but this is not something you can easily just do and be done with it…doing this well would take a lot of time and experimentation.

If you look towards the bottom of /proc/interrupts you will also see interrupt rescheduling, which might be a case of higher priority interrupts preempting lower priority interrupts. You will normally see a lot of these, and it isn’t something you can directly determine is “too many”, but if you have a fully loaded system which is not running badly, e.g., not running your tasks, then you can sort of watch this and get an idea of how fast it is going up. A command like this:
watch -n 1 -x egrep '(CPU0|IPI[0-9][:])' /proc/interrupts

Now if you can look at this and have a feel for how fast things “normally” change (a bit like the guy in the Matrix movie reading the data directly from the glyphs on the screen), you could run your code and see if rescheduling goes up a lot. Not a very technical way of doing it, but if there is too much rescheduling this might be a case of IRQ starvation.

If there is starvation, perhaps there is a way to give your driver a higher priority, but I don’t think I can help with that.

If you want to add the IRQ number for a set of IRQs to the watch command, it would go something like this (I’m randomly picking IRQs, one is for ethernet and another is for a USB controller):

watch -n 1 -x egrep '(CPU0|IPI[0-9][:]| 41[:]| 21[:])' /proc/interrupts

(the change is that I added " 41[:]| 22[:]" to see IRQs 21 and 41).

That particular sample is interesting because it is IRQ traffic from mmc0 and ethernet…typically these can require a lot if traffic is heavy.

Non-hardware-IRQ traffic will be unrelated to “/proc/interrupts”. Intentional migration of affinity is almost always from purely software processes which do not require direct hardware access. These are the ones you have a lot of control over, and if these are on CPU0, then it might be a good idea to force these software processes somewhere other than CPU0 (this still wouldn’t matter if you are not approaching IRQ starvation, although any movement would probably reduce hardware driver latencies).

Note that ksoftirqd is relatively smart and tends to use those other cores fairly well. You probably don’t need to interfere with those processes unless you run into some specific odd condition. If you see the process in “htop” or “top” or “ps”, then you will be interested in looking at setting affinity to a non-CPU0 core. If you have a single critical process, then perhaps you might assign that and only that to a specific core. It’s a lot of art and experimentation.