Is there any Real Time OS for Jetson???

It’s not clear to me exactly what GPU interaction you’re trying to measure the times for, but I’ll take a stab at an answer.

A simple kernel launch/synchronize operation on the TX2 can take in the neighborhood of 50us. For example, we did some timings using a trivial vector increment (pretty much the smallest unit of work that actually requires passing data back and forth) and a single block, and we saw maximum times around 53us, regardless of block size. That was with all the usual real-time features enabled.

We also did a number of timings using the persistent threads technique, and saw significant improvements, with the numbers for complete “workloads” at around 18us for 1024-element blocks, and less for smaller block sizes.

There’s a whitepaper with more detail on the results here. It includes the real-time features that were used, conditions of the tests, and other considerations:

[url]https://www.concurrent-rt.com/resources/whitepapers/[/url]

It’s the first one: Improving Real-Time Performance With CUDA Persistent Threads (CuPer) on Jetson TX2.

Hi Snarky,

RedHawk has two techniques to solve the problem of long mutual exclusion sections.

First, all RedHawk kernels offer CPU shielding (a.k.a. CPU isolation). With CPU shielding per-CPU system services are moved off shielded CPUs, reserving all shielded CPU cycles for the application(s) bound to those CPUs. The only system services that will run on the shielded CPUs are those related to the applications running on them.

For example, if a real-time application queues up a high res timer, RedHawk assumes that that timer is a critical part of the real-time application, and any system services directly related to the processing of that timer will happen on the shielded CPU that queued it.

Second, RedHawk provides kernels that also include the open source PREEMPT_RT patch. This patch replaces kernel spin locks with sleep locks to eliminate long preemption blocks.

Take care,
Jason

Thanks jbaietto!

On the TX2, there are two additional problems that I wonder how they affect real-time performance:

  1. The TX2 is said to do all of its I/O through Core 0 (on the A57.) This means that any I/O done from a thread that might run on another core has to be marshaled across, and whatever driver I/O and atomic locks are currently in effect may affect the latency of that.

  2. The AMD multi-core architecture is said to require serializing code to take interrupts on all the CPUs in certain cases, and this would make even a “shielded” CPu interlock with the highest-latency CPU. (I imagine scheduling affinity is part of the “shielding.”)

Hi Snarky,

You are correct in that both the ARM64 architecture generally and the TX2 design specifically create some additional challenges to achieving optimal real-time performance. However, the RedHawk kernels include several optimizations that help to reduce latencies (for example, automatically moving timers and per-CPU kernel daemons off of shielded CPUs) and these allow us to achieve respectable real-time performance on the TX2 in spite of these challenges – less than 50us worst case interrupt response latencies with cyclictest, with an average latency of 8us.

Take care,
Jason

Is there any installaton tutuorial for Jetson TX2 with ubuntu16.04+PREEMPT-RT patch OS?

What do you see now from:

zcat /proc/config.gz | grep PREEMPT

Are you looking for something in addition to the existing preempt? If so, do you have a URL for it?

Thanks Linuxdev,

The result is

Linux tegra-ubuntu4.4.38-rt49-rt#1 SMP PREEMPT RT aarch64 GNU/Linux.

I follow the instruction here https://github.com/kozyilmaz/nvidia-jetson-rt

The purpose of this question is that I am not sure does it really work ?

I did a test

cyclictest -t 5 -p 80 -n

It gives me

Min:5 Act:21 Avg:38 Max:375
Min:5 Act:22 Avg:36 Max:293
Min:5 Act:19 Avg:37 Max:267
Min:5 Act:20 Avg:35 Max:234
Min:6 Act:16 Avg:34 Max:196

.
Is it normal ?

I think the Avg time should below 10 in a real time system.

By the way , I did the same test on intel NUC5i7ryh , the Avg time is about 3~4.

I have not worked with those patches before, so I can’t say what to expect from those patches.

One thing to consider is that the hardware itself is not a hard realtime system, e.g., it isn’t a Cortex-R. Add to this that it is a general operating system, so you have a lot of drivers in there which you can’t control the behavior of. Whether or not the system can immediately switch to the other thread/process is somewhat dependent on the nature of what is currently running.

Be sure to first run:

sudo nvpmodel -m 0
sudo ~ubuntu/jetson_clocks.sh

Then try running your cyclictest like this:

sudo nice -n -2 cyclictest -t 5 -p 80 -n

(this gives a very slight priority increase by changing “nice” to -2…this would give your cyclictest a slight boost compared to the rest of the system, but not interfere with important processes)

The result from the above would be a better baseline to go by…whatever issues occur under those circumstances probably have more meaning than either the result in a power saving mode or the result from everything running at normal priority.

1 Like

Thanks Linuxdev,

You are right , I forgot the step.

sudo nvpmodel -m 0
sudo ~ubuntu/jetson_clocks.sh

Then it works well for me.

I also need real time capabilities for my application, so I have some question about the PREEMPT-RT patch.

  1. Did I understand everything correct: I have to comopile the kernel with the PREEMPT-RT patch and flash it via jetpack?
  2. Do I have access to the same drivers like I have with the standard kernel deployed by nvidia? (Because I need all the can drivers)

They said : "at the Sea, Air Space show this week showing RedHawk on the Jetson GPU. Please stop by our booth if you can - 1554. We do support the Nano "