We are looking to run multiple containers on Orin NX 16GB. Each of these containers will be processing video streams and running yolo models. How do we ensure that these containers are able to share resources fairly?
For example, we want to limit that each container occupies no more than 40% of the GPU and 25% of the GPU memory.
Hi,
GPU tasks on Orin are shared resources in a time-slicing manner and cannot be controlled.
Based on your use case, you can try to run the video streaming in the same process but with different threads.
Each thread has its own cuda stream so the tasks can run concurrently.
Thanks.
Thanks @AastaLLL I have no control over the other container once its deployed. I can only ask for changing any configuration on that container.
What would happen if the two containers are trying to consume maximum GPU at the same time? Will one of them fail, or will they just wait for the other one to finish?
Hi,
CUDA tasks will wait in the scheduler’s queue.
But GPU memory allocation might fail if running out of memory.
Thanks.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.