Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU)
GPU
• DeepStream Version
6.0.1
• Issue Type( questions, new requirements, bugs)
Question
We have been busy developing a Deepstream-based video analytics application in Python, initially using Jetson Xavier NX because of its ease-of-use and low cost. In the last few months we have succesfully “ported” (not really, but adapted) this application to also run on x86 with dGPU. For this we use the excellent Deepstream containers on NGC.
But, we do this on an old GPU server we had lying around based on 2x Sandy Bridge Xeon CPU’s and 3 x RTX2080Ti cards. It runs, has ample overhead, but we think we will need more when we start deploying our application to the rest of the company.
The thing is: every user will have to instantiate its own Docker container with its own instance of our application running inside. This is for various proprietary reasons. We are looking into Triton inference server for the future, to consolidate all inferencing, but for now every single container uses it’s own gstreamer pipeline with its own nvinfer and nvof elements and all the subsequent processing logic that we have implemented.
We are going to need proper hardware that can easily run 30 or more of these containers. Is that even feasible? Currently one instance of this application uses about 5% GPU processing and about 2GB of GPU memory on a RTX 2080Ti. But is it then safe to assume that I can scale this as long as I have memory and processing (per card) available? Or are there certain boundaries to keep in mind? Can I safely seperate the application in containers and let the host system and drivers figure out all the resource sharing?
The big question is: what hardware should we invest that gives us the best bang for our buck? I know that buying several H100 or A100 GPU’s would probably be best, but it’s also very expensive. Is there a current sweet sport with regards to Deepstream and inferencing where maybe consumer grade hardware is “better” in terms of price/quality ratio?
Also, a side question:
I run the application with all the elements set on GPU ID = 0 and then run the container with --gpus ‘“device={number}”’ to specify which GPU the container should run on. Is this best practice? Can I let the application or Docker decide on the GPU to use for dynamic load balancing?