Best approaches for MPS on Xavier

I am working on a product where we have many processes (also different threads per process) that may want to access GPU resources (memory requests, computation, access to other components like DLA PVA).

I understand that MPS is not available for the Xavier, but I wanted to know if a multi-process access pattern is a supported use case. Is there an MPS surrogate I can use to manage requests from different processes? Do I need to write a GPU arbiter to manage resources and requests for compute? What are the best practices for multi-process development for the Xavier (i.e. async streams for data transfer)?

Hi,

Jetson/ARM platform does not support MPS (Multi-Process Service).
You will need to put the separate ML/CUDA tasks in one application and different CUDA streams.

Since one CPU process creates one CUDA context.
If the CUDA tasks running on GPU are in different processes, they will run in different CUDA context.
The GPU resource for different CUDA contexts are time-sliced, indicating the kernel can’t run in parallel:
https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#multiple-contexts

Thanks.

Does the CUDA context control access to the DLAs / PVA / video encoder. Can we run operations on the DLA/GPU/any of the other chips in parallel?

Hi,

DLAs/PVA/encoder can be executed in parallel.
Usually the parallelism is controlled by the data flow.

You can check our DeepstreamSDK for some information:

And yes, you can run the operations in different chips in parallel.
But the challenge part is how to divider input and collect the output among different devices.

Thanks.