I will add some information on this, but it won’t answer your question (maybe it will lead to different questions).
There are two kinds of interrupt: Software and hardware. You will find hardware interrupts are listed in /proc/interrupts
. Hardware interrupts require an actual wire to trigger the scheduler, while software IRQs are more “virtual”. Either way, it is the scheduler which chooses when to send a process to a core, and for software IRQs, this can always go to any core. In the case of a hardware IRQ the scheduler can only send the process to a core with a driver that can be “wired” (with copper traces on the PCB) directly or indirectly to a given core.
On a desktop PC there are different ways to do this between Intel and AMD CPUs, but the Intel one is a good example. What Intel does is it has a “programmable I/O interrupt controller” (an IO-APIC). This controller is just a kind of switch to route the wiring to the correct core. If one were to migrate some hardware (e.g., a GPU or network card) to a specific core, then when scheduling find the hardware IRQ, it will set up the IO-APIC to route everything needed to that specific core, and will load the proper driver or other content onto the core.
Sometimes entire groups of hardware must migrate together, e.g., the GPIO.
On Jetsons there are many hardware IRQs which can only be routed to CPU 0. This is because there is no IO-APIC, and no method to route to other cores on some hardware. If that is the case, and if someone schedules for another core, then when the time comes the scheduler will reschedule to run on CPU 0.
When one designs a hardware driver it is “best practice” to perform only the very minimal work in that driver, and to then create a different driver for “soft” IRQ functions. An example would be a network card might get data from the hardware, and then issue a soft interrupt to schedule computing a checksum. This would be preferable (if checksum is computed in CPU) to performing the entire step in a single driver for the network card since it allows context switching between hard and soft IRQs.
At some point your CPU core might reach 100%, and it won’t be due to error. Instead, it is simply due to the amount of drivers which run on that CPU core in combination with the time required to service each hard IRQ. When there is no longer sufficient time to schedule everything needed, this becomes “IRQ starvation”.
One often runs everything from a given process on a single core for performance reasons. Each core has its own cache, and a cache miss is much slower than running with a cache hit. If the two interrupts (hard+soft) use the same data, then typically you get more cache hits by being on a single core. The scheduler tends to consider this the default case, and this is the reason why even if you have software IRQs which can be offloaded, they may still schedule on CPU 0. However, if there is IRQ starvation, then it might still be better to reschedule or offload a soft IRQ to another core. This other core will tend to mean you get more cache misses, but it is better than IRQ starvation.
You are correct to ask about too much load on CPU 0, but I can’t tell you which parts of your hardware can be scheduled on a different CPU core. If there is some hardware where the hardware IRQ itself can go to another core, then this would be the best situation. Quite often though, on a Jetson, you will find parts of the hardware can only reach CPU 0. If it turns out that this hardware IRQ and driver later triggers a software IRQ, then the default is to run that soft IRQ on the same core, but one could aid in preventing IRQ starvation by rescheduling the soft IRQ to a different core (at the likely expense of a cache miss).
I couldn’t tell you though which IRQs must stay on CPU 0, nor which soft IRQs are on that core which could be rescheduled. If you run top
or htop
, then you’ll notice the ksoftirqd
as the daemon which schedules software interrupts for a given core. If you can identify a software IRQ running on CPU 0, then this would be a candidate to move to another core. If there isn’t a significant CPU 0 soft IRQ user, then you might be out of luck.
I don’t know in your case what can be moved where. There might or might not be some tuning available.