CUDA usable with RTOS


I wonder, do I need to run Linux on the TX1 or is it possible to use CUDA with a proprietary RTOS?

Hi 42BS,

CUDA supports Windows, Linux and Mac OS, not for RTOS.
Is your question for TK1 or TX1? Here is the board for TK1.
We have the JetPack to help you yo install all software tools required to develop into your host PC, you could also refer to JetPack page:



actually the request I got is for Xavier, but if there is no CUDA drivers for an RTOS on TX1/TK1 then this will be true for Xavier as well, I guess.

The question is: What Linux resources/system-calls does the CUDA driver need?


Following is a related topic and the response may be helpful for you.

Well, it seems that there is really only one “official” reply as by kayccc above, that there is only support for Linux (when it comes to embedded).

This is really a pity esp. when I see the new Xavier which is for ISO26262. But neither Linux nor the CUDA driver a certified (at least not Linux) for functional safety.

So how can an application which needs hard real-time (harder than what Linux offers) exploit the GPGPU?

A simple baremetal CUDA driver would be great which does not rely on any specific OS.

Hi 42BS, please consult the Tegra K1 Technical Reference Manual (TRM) to access chip register information for programming custom OS. You may find a list of other Linux distributions ported to TK1 here:

Note that NVIDIA does not plan to release CUDA drivers.

Using the TRM linked above (which also exists for TX1), it may be possible to port your custom OS to an ARM core like the shadow core or SPE (special processing engine), with communication over the memory bus to Linux running CUDA.

Yes, I realize it would be the only way to have a Linux in parallel.

Forgive my ignorance, but what about memory access from the GPU? Can it be restricted to a certain memory window (=> ARM TrustZone) in order to have Linux+CUDA driver be locked in a sandbox?

For explanation, this is no actual project, but I’d like to see if the GPU is usable when it comes to functional safety.

If you use the zeroCopy method to allocate mapped memory between CPU/GPU, it may be possible. First allocate buffers with cudaHostAlloc() using cudaHostAllocMapped flag. This is the buffer that enables simultaneous access across the system. Depending on the device other than GPU you are using, you may find it necessary to translate virtual address to physical address in the kernel.