Hello,
We have a CCTV system where we use NVIDIA GPUs for video decoding. Our current requirement is to monitor GPU decoding and memory usage, and if the usage reaches 80%, we need to automatically switch new streams to the next available GPU.
We have implemented GPU monitoring using NVML, but when multiple streams are initiated simultaneously, they all tend to go to the same GPU. We are looking for an effective strategy or best practices to distribute the streams evenly across multiple GPUs when they are opened concurrently.
Any advice or suggestions on how to achieve this load balancing effectively would be greatly appreciated.
Thank you!