Please provide complete information as applicable to your setup.
**• GPU **
• DeepStream Version: 6.2
• TensorRT Version: Whatever comes in the DS 6.2 Docker container; also running Cloud Native Stack 10.2
• NVIDIA GPU Driver Version: 535.86.10
**• Issue Type( questions, new requirements, bugs): Question **
I have something of an open question as I try to understand more about the Deepstream SDK.
I’m running the Deepstream Reference app (“deepstream-app”) in a K8s cluster that also includes the Cloud Native Stack (and, therefore, GPU Operator). The DS app has been set to run the “DsExample” module/filter immediately after the tracking module. The DsExample filter has also been compiled to run with the “gst_optimized.cpp” version of the filter.
For a given number of RTSP streams passed in, say, 16, on some GPUs, e.g. an RTX 3060 Ti, things work just fine. In fact, I can throw 32, or even 48 streams at the RTX 3060 Ti and the GPU utilization averages around 97% and the FPS performance decreases in proportion to the number of streams I throw at it (I’ve yet to make it fall over).
However, if I use a “weaker” GPU (e.g. an A2), it doesn’t take very long for the application to stop processing (i.e. the application continues to run, the but performance metrics shows all streams grind to a halt at 0 FPS). This is for any number of streams greater than 8. I can appreciate that the DsExample requires some horsepower when processing full frames, which I’ve configured it to do. I can also see via the GPU Operator metrics (via DCGM exporter) that when it does stop processing, the GPU utilization seems to have hit 100% and then shortly after stopped. If I limit the number of streams to 8, the GPU utilization hovers around 96-97%.
With some debug code, I can see that “gst_dsexample_submit_input_buffer” ceases to be called, and the semaphore/conditional block on the process_lock (called if the process_queue is empty) is never signalled and so blocks forever.
I’m curious as to what about the A2 (or GPUs in general) would cause “gst_dsexample_submit_input_buffer” to not be called on the filter. It could very well be that it’s starved because no frames are making it through one or more upstream filters, but I’m not sure how to check that.
Thanks in advance for any help and/or insight!