How to improve real-time performance of the RT kernal?

946989179 · July 11, 2025, 1:51am

I installed the RT kernel via OTA update. And I have already choose the MAXN power mode and run the ‘sudo jetson_clocks’ command. And the maximum latency tested by cyclictest is about 200us:

cpu1 and cpu3 have been isolated by APPEND ${cbootargs} isolcpus=1,3 nohz_full=1,3 rcu_nocbs=1,3

Here is the information after runnning
nvidia@ubuntu:~$ sudo jetson_clocks --show
[sudo] password for nvidia:
SOC family:tegra234 Machine:NVIDIA Jetson AGX Orin Developer Kit
Online CPUs: 0-11
cpu0: Online=1 Governor=schedutil MinFreq=2201600 MaxFreq=2201600 CurrentFreq=2201600 IdleStates: WFI=0 c7=0
cpu1: Online=1 Governor=schedutil MinFreq=2201600 MaxFreq=2201600 CurrentFreq=2201600 IdleStates: WFI=0 c7=0
cpu2: Online=1 Governor=schedutil MinFreq=2201600 MaxFreq=2201600 CurrentFreq=2201600 IdleStates: WFI=0 c7=0
cpu3: Online=1 Governor=schedutil MinFreq=2201600 MaxFreq=2201600 CurrentFreq=2201600 IdleStates: WFI=0 c7=0
cpu4: Online=1 Governor=schedutil MinFreq=2201600 MaxFreq=2201600 CurrentFreq=2201600 IdleStates: WFI=0 c7=0
cpu5: Online=1 Governor=schedutil MinFreq=2201600 MaxFreq=2201600 CurrentFreq=2201600 IdleStates: WFI=0 c7=0
cpu6: Online=1 Governor=schedutil MinFreq=2201600 MaxFreq=2201600 CurrentFreq=2201600 IdleStates: WFI=0 c7=0
cpu7: Online=1 Governor=schedutil MinFreq=2201600 MaxFreq=2201600 CurrentFreq=2201600 IdleStates: WFI=0 c7=0
cpu8: Online=1 Governor=schedutil MinFreq=2201600 MaxFreq=2201600 CurrentFreq=2201600 IdleStates: WFI=0 c7=0
cpu9: Online=1 Governor=schedutil MinFreq=2201600 MaxFreq=2201600 CurrentFreq=2201600 IdleStates: WFI=0 c7=0
cpu10: Online=1 Governor=schedutil MinFreq=2201600 MaxFreq=2201600 CurrentFreq=2201600 IdleStates: WFI=0 c7=0
cpu11: Online=1 Governor=schedutil MinFreq=2201600 MaxFreq=2201600 CurrentFreq=2201600 IdleStates: WFI=0 c7=0

is there any other thing i can do to improve the performance?

I switched to the non-RT kernel, the performace is nearly no difference:

DaneLLL · July 11, 2025, 2:56am

Hi,
The result looks optimal. You can run $ sudo tegrastats to confirm the CPU cores are at full loading.

You can also refer to the post to compare non-RT and RT kernels:
Jetson orin AGX PREEMPT-RT RT-test - #3 by DaneLLL

linuxdev · July 11, 2025, 8:24pm

FYI, the RT kernel is not a magic bullet. The difference between an RT kernel and the regular kernel is in the scheduler. That scheduler determines what to run, and when to run, on each core. The default scheduler tries to be “fair” (one reason priorities are changed with a command called “nice” and “renice”…how nice will a process be to another process?); the RT kernel normally tries to be fair, but differs from the default kernel regarding preemption…when one process or thread has hogged a core for too long and something else needs to run, if the scheduler thinks the next process or thread is more important, it bumps the original thread or process and puts the new one in place.

To get RT implies you have marked some process as being in a cgroup and that cgroup is marked as more important than some other cgroup. If your high priority cgroup needs to run, and a lower priority cgroup is running, then the lower priority gets preempted no matter how “un-nice” it is to rip it away from the CPU.

Also, this is only “soft” real time when you have (A) installed the RT kernel, and (B) set a cgroup, and (C) tuned the priority of each cgroup. If there is cache involved, then cache makes it impossible to be “deterministic”, which is what hard real time would do. A cache miss makes it take longer. Often processes are told to run on some different CPU core and to stick to that core in order to have a higher probability of a cache hit, but you still get a cache miss if some other process has cleared the cache of your data, or if the process is using so much data that the cache can’t hold it all. You have not created cgroups, and you have not set their priority, and so all you will see is a slightly more “rounded” timing. What you might need is some specific process to be real time, and let the others fight for time.

For what follows, understand that when a driver or feature of the kernel runs, there is an interrupt issued. The scheduler sees the interrupt, and determines what to do, e.g., preempt something else, wait, or simply throw away the request. In the case of a hardware driver this is a hardware IRQ; in the case of a software feature which does not need physical hardware this is a software IRQ.

This gets more complicated compared to a desktop PC because a desktop PC has a mechanism to route a hardware IRQ to any CPU core (on an Intel CPU this is the programmable I/O-APIC, the Asynchronous Programmable Interrupt Controller). Hardware interrupts need physical wiring to reach any CPU core it is scheduled to run on. Software IRQs can always run on any CPU core.

Jetsons do not have an I/O-APIC, and much of the hardware can run only on CPU0 (the first CPU core). You can tell a hardware IRQ to run on a different core, but the scheduler will then have to reschedule it for the first CPU core, and so a rescheduled IRQ takes longer to run. Hardware such as the disk controller and a number of hardware devices can only run on CPU0. If CPU0 gets overwhelmed with requests, then you get IRQ starvation. There are also problems if some process on CPU0 gets a higher priority (lower “niceness”) than something the process needs, e.g., if your program has higher priority than what reads the data from disk, then the disk controller might never run and instead of behaving better you’ll get a system that stalls out. Magic isn’t free!

Hardware which is actually designed for hard real time tends to not have cache. This is the realm of the ARM Cortex-R series; Jetsons CPU cores are ARM Cortex-A (there is some embedded hardware you cannot normally get at which are low end Cortex-R5, the Image Signal Processor, ISP, and the Audio Processing Engine, APE).

If you do put software processing on a different core, then odds of getting what you want can go up, but often this results in more cache misses depending on what is happening. You need to carefully decide what process you want to have higher priority, create a cgroup for it, and increase its priority. If possible, then you might assign the process to a different (non-CPU0) core (see affinity).

Incidentally, the files in “/proc” are not real files, and are instead in RAM and part of the kernel “pretending” to be a file. The file “/proc/interrupts” is continuously updated, and is a hardware IRQ table. You can run “less /proc/interrupts” and see what is going on; then run that command again, and interrupt counts will have incremented. You’ll notice that timers are available on every core, but most everything else is stuck on CPU0. If you were to set affinity of one of those drivers to another core, and that core doesn’t have the wiring, then you’d see two things: The core IRQ count would go up, but then so too would the reschedule count. Later, the CPU0 count would go up. This is the effect of going to a wrong core, realizing the interrupt cannot run there, and then running on the core which can run that.

Good drivers typically access hardware for the least possible time; any code which can run purely in software then get reissued to a new function via a software interrupt. ksoftirqd schedules software interrupts, and if you run “ps aux | grep ksoftirqd” you’ll see processes used for running those soft IRQs. An example would be if a network device receives data, then it must be a hard IRQ; then, if it needs a checksum, it might issue a soft IRQ to complete the checksum (which could run on any core even in the Jetson). Had the hardware IRQ also run the checksum, then the core would be held longer, leading to delay of other processes which have to use that core. A soft IRQ does often run on the same core anyway, and this is because the chances for a cache hit go up (this is mostly irrelevant on a Cortex-R series CPU, but there are also security registers, so it isn’t completely irrelevant). On the other hand, the hardware portion could be given a higher priority then the software portion, and this might be suitable.

It all depends on what you need to run smoothly, and how you’ve told the scheduler to decide what “must” run now. You can’t completely say “must” with the Cortex-A, it is more of a case of saying “should”.

system · August 13, 2025, 2:14am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
The real-time kernel schedulability has not reached below 90us Jetson AGX Orin kernel , preempt_rt	6	168	August 25, 2025
Real-time performance tunning Jetson AGX Orin preempt_rt	2	215	April 7, 2025
Jetson orin AGX PREEMPT-RT RT-test Jetson AGX Orin kernel	2	338	April 23, 2025
How to adjust CPU frequency for Jetson, not using the 'sudo Jetson_clocks' method Jetson Orin NX preempt_rt	2	123	July 4, 2025
How to use real-time kernel in Jetpack 6.1 with AGX Orin? Jetson AGX Orin kernel , preempt_rt	2	144	May 13, 2025
Real-time OS for Jetson AGX ORin 64gb Jetson AGX Orin rtos	5	1246	December 27, 2023
The real-time performance of Jetson Orin NX 16g is the same as that of a primary kernel Jetson Orin NX preempt_rt	2	118	June 25, 2025
Realtime kernel for ROS2 on Jetson Jetson Orin Nano preempt_rt	3	291	December 17, 2024
Real-Time Kernel Package Jetson AGX Orin kernel , preempt_rt	5	3833	November 23, 2022
系统实时性 Jetson AGX Orin chinese , preempt_rt	3	268	September 3, 2024

How to improve real-time performance of the RT kernal?

Related topics