FYI, the RT kernel is not a magic bullet. The difference between an RT kernel and the regular kernel is in the scheduler. That scheduler determines what to run, and when to run, on each core. The default scheduler tries to be “fair” (one reason priorities are changed with a command called “nice” and “renice”…how nice will a process be to another process?); the RT kernel normally tries to be fair, but differs from the default kernel regarding preemption…when one process or thread has hogged a core for too long and something else needs to run, if the scheduler thinks the next process or thread is more important, it bumps the original thread or process and puts the new one in place.
To get RT implies you have marked some process as being in a cgroup and that cgroup is marked as more important than some other cgroup. If your high priority cgroup needs to run, and a lower priority cgroup is running, then the lower priority gets preempted no matter how “un-nice” it is to rip it away from the CPU.
Also, this is only “soft” real time when you have (A) installed the RT kernel, and (B) set a cgroup, and (C) tuned the priority of each cgroup. If there is cache involved, then cache makes it impossible to be “deterministic”, which is what hard real time would do. A cache miss makes it take longer. Often processes are told to run on some different CPU core and to stick to that core in order to have a higher probability of a cache hit, but you still get a cache miss if some other process has cleared the cache of your data, or if the process is using so much data that the cache can’t hold it all. You have not created cgroups, and you have not set their priority, and so all you will see is a slightly more “rounded” timing. What you might need is some specific process to be real time, and let the others fight for time.
For what follows, understand that when a driver or feature of the kernel runs, there is an interrupt issued. The scheduler sees the interrupt, and determines what to do, e.g., preempt something else, wait, or simply throw away the request. In the case of a hardware driver this is a hardware IRQ; in the case of a software feature which does not need physical hardware this is a software IRQ.
This gets more complicated compared to a desktop PC because a desktop PC has a mechanism to route a hardware IRQ to any CPU core (on an Intel CPU this is the programmable I/O-APIC, the Asynchronous Programmable Interrupt Controller). Hardware interrupts need physical wiring to reach any CPU core it is scheduled to run on. Software IRQs can always run on any CPU core.
Jetsons do not have an I/O-APIC, and much of the hardware can run only on CPU0 (the first CPU core). You can tell a hardware IRQ to run on a different core, but the scheduler will then have to reschedule it for the first CPU core, and so a rescheduled IRQ takes longer to run. Hardware such as the disk controller and a number of hardware devices can only run on CPU0. If CPU0 gets overwhelmed with requests, then you get IRQ starvation. There are also problems if some process on CPU0 gets a higher priority (lower “niceness”) than something the process needs, e.g., if your program has higher priority than what reads the data from disk, then the disk controller might never run and instead of behaving better you’ll get a system that stalls out. Magic isn’t free!
Hardware which is actually designed for hard real time tends to not have cache. This is the realm of the ARM Cortex-R series; Jetsons CPU cores are ARM Cortex-A (there is some embedded hardware you cannot normally get at which are low end Cortex-R5, the Image Signal Processor, ISP, and the Audio Processing Engine, APE).
If you do put software processing on a different core, then odds of getting what you want can go up, but often this results in more cache misses depending on what is happening. You need to carefully decide what process you want to have higher priority, create a cgroup for it, and increase its priority. If possible, then you might assign the process to a different (non-CPU0) core (see affinity).
Incidentally, the files in “/proc” are not real files, and are instead in RAM and part of the kernel “pretending” to be a file. The file “/proc/interrupts” is continuously updated, and is a hardware IRQ table. You can run “less /proc/interrupts” and see what is going on; then run that command again, and interrupt counts will have incremented. You’ll notice that timers are available on every core, but most everything else is stuck on CPU0. If you were to set affinity of one of those drivers to another core, and that core doesn’t have the wiring, then you’d see two things: The core IRQ count would go up, but then so too would the reschedule count. Later, the CPU0 count would go up. This is the effect of going to a wrong core, realizing the interrupt cannot run there, and then running on the core which can run that.
Good drivers typically access hardware for the least possible time; any code which can run purely in software then get reissued to a new function via a software interrupt. ksoftirqd schedules software interrupts, and if you run “ps aux | grep ksoftirqd” you’ll see processes used for running those soft IRQs. An example would be if a network device receives data, then it must be a hard IRQ; then, if it needs a checksum, it might issue a soft IRQ to complete the checksum (which could run on any core even in the Jetson). Had the hardware IRQ also run the checksum, then the core would be held longer, leading to delay of other processes which have to use that core. A soft IRQ does often run on the same core anyway, and this is because the chances for a cache hit go up (this is mostly irrelevant on a Cortex-R series CPU, but there are also security registers, so it isn’t completely irrelevant). On the other hand, the hardware portion could be given a higher priority then the software portion, and this might be suitable.
It all depends on what you need to run smoothly, and how you’ve told the scheduler to decide what “must” run now. You can’t completely say “must” with the Cortex-A, it is more of a case of saying “should”.