Some kernel design information might be useful…
The kernel scheduler determines what process gets what time and which CPU core to use. Basically running drivers. The drivers themselves probably have certain pieces of code which talk directly to the hardware, and there is no avoiding locking a core to that task during that time. For example, i/o to a physical address of the hardware might only be applicable at certain times on a shared bus. The kernel must use atomic/non-divisible operations during that time and preemption will not be possible.
However, there are times when things are done in the kernel which do not require locking the CPU core to a given function. When that code is run scheduling becomes much more like user space and the ksoftirq management software can migrate that code and run it based on fair sharing times. Atomic operations are not required for this. Ksoftirq times indicates purely software operations or operations which can be preempted are sharing cores (this is a “good thing”).
A bad kernel driver will do both mandatory atomic operations and any related operations without ever unlocking the CPU core. A good kernel driver design will put the atomic code in one place, and the code which can be preempted somewhere else. Then mandatory locking will occur only for the part of the code which needs this (someone must know this and program accordingly), and code which can share time and not corrupt or fail with multitasking will be handled separately with ksoftirq. During the time of mandatory locking nobody else can use that core and the code will not share (the system will become less responsive…it’s good to have more CPU cores for that case, but cache misses mean extra cores are still not as good as sharing of a core).
Seeing a large amount of traffic resulting in ksoftirq running says the authors had a good driver design for that hardware (someone separated atomic from sharable operations). However, differences in ethernet hardware do exist. Not all ethernet hardware has the same features. In some cases some hardware actually offloads work to the ethernet chipset and the kernel does not need to do that work. Other hardware may depend on the CPU for parts of the work. An example would be hardware compression versus software compression, and some (but not all) chipsets will support hardware compression…hardware compression would be lower latency and lower CPU load, but the work would still be done…only that work would be done in the ethernet hardware instead of CPU. This latter case would not show CPU load and would avoid ksoftirq, but the reason would not be due to bad design: The reason would be due to non-CPU hardware performing the same task, and this is a “good thing” if the compression hardware exists (I don’t know what the existing hardware supports, this is just a contrived example).
If an ethernet driver were to do all of the work in atomic/non-preemptable fashion, but not need to do so, then you would also see a lack of load on ksoftirq. However, this would be bad design. If code can be offloaded to ksoftirq, but is not offloaded, then the CPU core is locked longer than needed and unable to share or multitask for excessive lengths of time. The work would still be done, but the work would be done for that driver and would not show up under ksoftirq.
In your case there are these possibilities:
- The ethernet hardware is offloading work so ksoftirq is not needed for the system not loading ksoftirq.
- The driver without ksoftirq loading is failing good design and the work is being performed atomically within the driver:
Side effect being a less responsive system and seeing the load somewhere other than ksoftirq.
- Something in the design or operation of the low softirq case does not require the work which ksoftirq was doing for high ksoftirq load. This is possible and not unreasonable in cases where perhaps different network settings are being used (the hardware and drivers could be the same, but the data and parameters being processed differ). A contrived example might be one system is using jumbo frames (low load) and the other is not (high load)...it would appear to be the driver's or hardware's fault, but in reality it would be a case of different loads from different data handling and the observer not knowing about the jumbo frame differences.