GPU management deploying deepstream on Kubernetes

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) GPU
• DeepStream Version 6.1.1
• JetPack Version (valid for Jetson only)
• TensorRT Version
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs) question
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Hi, I see similar question but it’s closed with no definite answer How to manage thousands of video streams and feed to deepstream?

I wanted to ask what is the recommended course of action for using Kubernetes to run multiple instances of Deepstream apps on multiple nodes and multiple gpus available to each node?

How can I make the best use of the hardware I have at my disposal and possibly avoiding multiple manual changes in helm configs / deepstream configs?

DeepStream is a SDK to develop inferencing applications. The DeepStream applications are just the same as other applications which can run in Kubernetes. For Kubernetes, seems you need to refer to Kubernetes document.

I know what DeepStream is, thanks…

Let me clarify maybe the situation I’m in:

I have previously used your helm video-analytics-demo chart version 0.1.5 as guide. There I provided gpu for DS instance in values.yaml and used that value to set the same gpu-id using modified created_config.py.

But as you can imagine, it’s not a very flexible solution when more streams and containers need to be handled.

In recent video-analytics-demo helm chart, version 0.1.8 DeepStream - Intelligent Video Analytics Demo | NVIDIA NGC I see gpus mentioned under

resources:
  limits:
    nvidia.com/gpu: 1

with gpu-id=0 everywhere where applicable.

But it is still not clear to me if there is a nice way to handle situation where I have more pods (or containers or ds-app instances - 1 DS instance is one pod/container here) to run than gpus (and more than one node)

I wish to be able to have bigger batches and less DS apps running but it’s not possible right now as some post processing of the bounding boxes appears to be a cpu-bound process.

To summarize:
I’m looking for a advice that would help me extend your helm charts to situations where multiple nodes have access to multiple gpus but the number of pods that needs to be run is greater than the total number of gpus.

I do not expect the streams to be reassigned to different gpus during their lifetime. I can assume my pods will need a reasonably constant amount of resources. I’m just hoping for something that would save me some initial guessing and manually modify tens of config files. Something to start tenth DS instance on less busy gpu.

I don’t think it’s a very unusual usecase hence I’m asking for suggestions.

And why here and not on kubernetes forums? Because it’s ds config that requires gpu-id that needs to be hardcoded in multiple places, even thought it needs to be the same for the app to run.

GPU time slicing may help to have pods share GPUs:
https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/gpu-sharing.html