Hardware buying choices

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
• DeepStream Version
• Issue Type( questions, new requirements, bugs)

We have been busy developing a Deepstream-based video analytics application in Python, initially using Jetson Xavier NX because of its ease-of-use and low cost. In the last few months we have succesfully “ported” (not really, but adapted) this application to also run on x86 with dGPU. For this we use the excellent Deepstream containers on NGC.

But, we do this on an old GPU server we had lying around based on 2x Sandy Bridge Xeon CPU’s and 3 x RTX2080Ti cards. It runs, has ample overhead, but we think we will need more when we start deploying our application to the rest of the company.

The thing is: every user will have to instantiate its own Docker container with its own instance of our application running inside. This is for various proprietary reasons. We are looking into Triton inference server for the future, to consolidate all inferencing, but for now every single container uses it’s own gstreamer pipeline with its own nvinfer and nvof elements and all the subsequent processing logic that we have implemented.

We are going to need proper hardware that can easily run 30 or more of these containers. Is that even feasible? Currently one instance of this application uses about 5% GPU processing and about 2GB of GPU memory on a RTX 2080Ti. But is it then safe to assume that I can scale this as long as I have memory and processing (per card) available? Or are there certain boundaries to keep in mind? Can I safely seperate the application in containers and let the host system and drivers figure out all the resource sharing?

The big question is: what hardware should we invest that gives us the best bang for our buck? I know that buying several H100 or A100 GPU’s would probably be best, but it’s also very expensive. Is there a current sweet sport with regards to Deepstream and inferencing where maybe consumer grade hardware is “better” in terms of price/quality ratio?

Also, a side question:
I run the application with all the elements set on GPU ID = 0 and then run the container with --gpus ‘“device={number}”’ to specify which GPU the container should run on. Is this best practice? Can I let the application or Docker decide on the GPU to use for dynamic load balancing?

You may refer to the table shows the end-to-end application performance from data ingestion, decoding, image processing to inference which sharing at NVIDIA DeepStream SDK | NVIDIA Developer to see if can gain some ideas.

Hi @willemvdkletersteeg ,
1). Server
Officially, you need to have a data center server ( https://www.nvidia.com/en-us/data-center/data-center-gpus/qualified-system-catalog) which are cerified by NvQual tool .

2). codec capability

You can check NVIDIA VIDEO CODEC SDK | NVIDIA Developer to find the decoding/encoding capability

3). compute capability

We recommend to use batch mode and low inference precision, e.g. fp16, int8.
You can refer to Developer Guide :: NVIDIA Deep Learning TensorRT Documentation if you are using TensorRT.
And, I think you can try 30 or your target streams on 2080Ti, and check if its compute capability is enough, and then , based on on that, check other GPU cards which INT8/FP16 Tops can support 30 or your target streams

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.