RTOS on Jetson Orin NX combined with Triton

Hello everyone

For an upcoming project, I’m looking for hardware that could support mission-critical AI inferencing software, serving multiple concurrent models with different backends at the same time whilst having control over the maximum allowed latency of certain parts in the software.

After some research, I have identified the following stack:

  • Hardware: NVIDIA Orin NX
  • Real-Time Operating System: RedHawk Linux (support for Orin is coming)
  • Inferencing software: NVIDIA Triton Inference Server

I still have some open questions that I would like to see answered:

  • I read on the NVIDIA forums that true hard real-time is not possible on CPUs with caches due to inconsistency in cache misses, so if an RTOS like RedHawk promises hard real-time on the jetson platform, where exactly does it run? The Cortex A78AE have caches, so does it run on the Cortex R52 cores in the Orin safety island (I don’t find much information about this) as some kind of supervisor for a normal Linux?

  • I read that CUDA is not supported for RTOS, is this still the case? Say that I would only run an RTOS would that mean that I will not be able to harness the GPU computation capabilities (and how about the AI inferencing cores, NVDLA2.0)? It seems that RedHawk does support CUDA, do they have their own implementation?

  • Does it make sense to run Triton on top of an RTOS on top of the Orin?

Thank you if you can shine some light on any of these questions, or provide some feedback!

This is probably something the Redhawk people would have to answer. FYI, caches can be disabled. Performance suffers significantly, but the ability of deterministic timing would go up. How successful this is would in part be determined by the number of processes which have hard realtime scheduling. The scheduling routine itself can go up in load exponentially with number of processes, and the Cortex-R series has hardware for this, but a Cortex-A does not. So the main issues to consider for how close it gets to deterministic are:

  • Cache hits and misses.
  • Scheduling ability at a given load (implying number of threads/processes being managed with priorities).

I would be quite interested in hearing how the Redhawk people work to get this as close as possible to hard realtime, and what the limits are on scheduling. Don’t know if they monitor this forum, but maybe they could be directed here?

Note: I do not know if they support the NVIDIA GPU driver. I’m guessing they have to, or there would much less of a demand for Redhawk on a Jetson.

1 Like

I have seen them respond to posts in the past, so I hope that someone could shine some light on the matter. Or maybe an NVIDIA Developer that knows more about Triton on RTOS systems could also give some insights?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.