Hello, I just wonder if anyone here work with the highly interrupted system before ? I am new to Jetson platforms and I think there are different ways for interrupt handling on other platforms, but on this platform all interrupts go to core 0. This creates problems in systems that generate interrupts very often. Right now, when I type cat /proc/interrupts, I see that the device called hsp generates a lot of interrupts. I set this interrupt to cores 4-5-6 at certain intervals so that it does not block other processes. How can I manage this better?
Hi,
Please check the discussion in
IRQ Balancing
And please note certain hardware engines are fixed to CPU core 0.
Just some info surrounding this topic: There are hardware IRQs and software IRQs. “/proc/interrupts
” shows only hardware IRQs. For a hardware IRQ to run on a core it has to have physical wiring, but a software IRQ can run on any core. One can mark a hardware IRQ to run on a core without the wiring, but when the IRQ occurs the scheduler will end up migrating back to the core with the wiring (CPU0). In some cases a group of hardware devices can migrate to a different core, but not individual parts of that group (e.g., there is some GPIO ability to transfer an entire group of GPIO to a new core).
Software IRQs are scheduled by ksoftd
, and you can see some of those in htop
or top
. A good design for a hardware driver is to do the absolute minimum work while the hardware is being accessed, and then to spawn a soft IRQ to handle anything else needed by the driver (thus releasing the core for other hardware IRQs).
It isn’t unusual for the scheduler to resist moving a software IRQ (or hardware IRQ if it can be moved) as often as people might think is useful. The scheduler has knowledge of the cache, and tries to take advantage of that. Even if something runs on CPU0 that does not have to run there, then the reason might in part be because it is lower cost to have a cache hit instead of a cache miss.
What becomes particularly important is that many of the hardware drivers have “atomic” code in them in which the core cannot context switch until the atomic section is complete, but soft IRQ drivers typically can be interrupted and higher priority IRQs can run (one could set the hardware device to be slightly higher priority, or the software IRQ a slightly lower priority). On the other hand, unless cache is large, a context switch from a running soft IRQ to a hardware IRQ might wipe out the cache anyway, in which case the soft IRQ sticking to that core would not help; that might be a good time to migrate the soft IRQ to another core.
Check the IRQ balancing URL above from @DaneLLL since moving around drivers and processes to different cores is not always as great as it seems it would be.
Hello DaneLLL,
The thread below is mine and I’ve been working on this almost for a month:
[> Jetson TX2 - CPU Core 0 Stuck at 100% Usage and System Crash Issues](Jetson TX2 - CPU Core 0 Stuck at 100% Usage and System Crash Issues - #8 by KevinFFF)
with helps of @KevinFFF and @linuxdev
Now, I found out that my “hsp” port creates most of the load on the cpu (I also couldn’t find much info on this interrupt device “hsp”). Which also resulted from the log below;
[ 125.110891] pcieport 0000:00:1d.0: AER: Corrected error received: id=0060
[ 125.110895] nvme 0000:04:00.0: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
[ 125.110898] nvme 0000:04:00.0: AER: device [10de:10e6] error status/mask=00001000/00002000
[ 125.110899] nvme 0000:04:00.0: AER: [ 12] Replay Timer Timeout
Currently, temporarily to save time, I tried switching interrupts of this port to other ports but this will only postpone the problem I guess.
What carrier board are you using? Is this an actual dev kit, or does it use a third party carrier board?
Note that the signal used for PCIe is quite fast and needs to be close to perfect. The signal quality for this is a combination of devices, which includes the device end point and any PCIe bridge. If the entire combination is not close to perfect you will start getting errors. I don’t know if this is the cause of the error you show, but it “usually” nothing software can change other than trying a different pre-emphasis and de-emphasis.
The “pre” of pre-emphasis boosts the amplitude of the start of a square wave, and then the “de” reduces the magnitude used at the start of the wave the same amount; one is at the source of the square wave, the other is at the sink, and overall intended effect is that the two will cancel each other out. However, the magnitude of the more vulnerable part of the square wave is a higher amplitude relative to the rest of the square wave. This might seem odd, but it helps the forward wave and reduces interference by the reflected wave. If it works correctly, then an “eye diagram” is open. You probably are not really interested in this, but PCIe signals are one of the most complicated designs you’ll ever run into (at least so far as electrical is concerned). If you’re really bored:
- Basic: https://www.youtube.com/watch?v=cL7QsELuv_M&t=18s
- More advanced: https://www.youtube.com/watch?v=tZiKRfH2yZ4
- Related to “signal integrity”: https://www.youtube.com/watch?v=Nu6aXTMgksk
- More specific to topics like spread spectrum clocking and pre/de- emphasis: https://www.youtube.com/watch?v=fd3qaDP1C_o&t=325s
The test equipment itself usually far exceeds $100k USD. If you take a very high performance MSO, and merge it with a vector network analyzer with a lot of channels, and then provide digital data patterns to use instead of a sine wave sweep generator, then you get the ability to test PCIe.
Back to your problem, most such errors are related to signal integrity. Advanced Error Correction (AER) is optional, but it stands a chance of making a correction when there is a detected error.
Sometimes the error is software-related, but usually not. If for example a checksum is computed wrong, or if a programmable pre-emphasis differs from the programmed de-emphasis, then you’d have a software error that looks like a hardware error. It is really worth noting that this complex signal exists only because of how the bus, source, and sink interact together. One can rarely ever blame just one component, and every single device will probably show as functioning correctly under many combinations, but then fail with a particular combination using otherwise working hardware.
Every time there is a data transfer on PCIe there will be a hardware IRQ triggered. Every time there is a correction, there will be at least one IRQ to do the same thing again. It is possible for CPU0 (which handles those particular hardware IRQs so far as I know) becomes saturated and is no longer able to respond to the retry after an error simply because it is working on other things as well. That’s IRQ starvation, but there is a strong chance this is not yet going on. If it is, then moving to a new CPU core is unlikely to change anything since there would be a PCIe bridge source, a data bus, and a PCIe sink (the device) where any signal issue exists. It is true that if one had a method to move the IRQ to a new core that the signal from bridge to CPU would change, and if that were the place of an error, then possibly the change would correct it.
Just to emphasize a point, the AER is a symptom, not a cause. The saturation of CPU0 is a side-effect (in this case) of correcting errors. Knowing what carrier board this is would be mandatory as a starting point. Trying other PCIe devices on that slot which are similar might be useful too since you’d end up spending hundreds of thousands of dollars for equipment which could directly answer what’s wrong. It may be that a different end point device (e.g., replacing one NVMe with another) is all that is needed for it to work. Try swapping devices if you have more. Look for the AER.