Real-time data transfer between CPU and GPU

Hi there.

We plan to use DRIVE AGX (Pegasus) platform for real-time computing. Real-time in our case means that the system will always produce the same output when starting conditions and inputs are the same.

When running the program on a GPU there is one fundamental problem: None known to us GPU management programs allow the eviction of a running GPU kernel before its finish, that is, none of them allows preemptive GPU schedules [1][2][3]. Having preemptive scheduling is the bread and butter of determinism and real-time. Another important issue for determinism and real time is memory management, preferably all the memory needs to be statically pre-allocated and then just re-used during the runtime.

We would like to know how does Nvidia plan to tackle these challenges in on e.g. NVIDIA DrivePX Pegasus?

If you have a working example we would greatly appreciate it. When we talk about the determinism we talk about 2 types of determinism:

  1. Timing determinism: Execution time of GPU kernels is NOT deterministic, if for no other reason than the CPU-GPU interaction being non-deterministic.

  2. Result determinism: Floating-point (and even fixed-point or integer) arithmetic is not associative. That is, ((a+b)+c) may not be the same as (a+(b+c)). So the result of a set of operations depends on the order of doing those operations. So although we are not sure if this would lead to non-determinism in CUDA/GPUs (i.e., every time you run a kernel, you get slightly different results), every time you change the GPU or CUDA version, you run the chance of getting different results.

[1] https://cdr.lib.unc.edu/indexablecontent/uuid:03fd97bb-86ac-44b5-8a71-c6eb73485ff8
[2] https://stackoverflow.com/questions/21996720/gpu-and-determinism
[3] https://cs.unc.edu/~tamert/papers/ecrts18b.pdf

Hi dejan0wt1b,

For your questions, please find below comments from our CUDA team as your reference:

Scheduling: there are two layers to support applications in having deterministic scheduling. At the CUDA level, we will expose APIs to express tasks and dependencies across them, and at the system level, NVIDIA will implement a time-triggered scheduling system that will help ensure determinism. Further details of this work are in development and we suggest reaching out to your account team at NVIDIA for more information.

Timing determinism: are there further opens not addressed by the answer above?

Result determinism: NVIDIA GPUs are IEEE 754 compliant. This is not CUDA specific and we do not introduce further non-determinism beyond what’s expected out of a compliant CPU, for example. Having said that, here is a whitepaper detailing fp accuracy considerations on NV GPUs from a CUDA standpoint: https://docs.nvidia.com/cuda/floating-point/index.html.

Thanks