Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU) GPU • DeepStream Version 5.1
We are using deepstream-triton image available on NGC. We have created a pipeline using deepstream (nvinferserver plugin). We have noticed that only GPU-0 goes up to 100% utilization whereas GPU-1 sits idle. Doesn’t nvinferserver supports load balancing?
Hello, Thanks for the response. It says “Device IDs of GPU to use for pre-processing/inference (single GPU support only)”. Does that mean nvinferserver can run inference either on GPU-0 or GPU-1. It can not do load balancing the way Triton is doing. Is that correct understanding?
Nvidia triton server has a capability to share the inference load on multiple gpu’s. I have shared this image of standalone triton server where I have run inference on multiple images, GPU-0 is utilized 28% and GPU-1 is utilized 30%.load_balancing|538x500
But when I run deepstream code with nvinferserver, only GPU-0 is utilized and GPU-1 sits idle. If nvinferserver is creating its own nvidia-triton-server, why it is not doing load balancing?