IVA Performance Issue on A2 GPU

This might be a bit of a longshot but I’m struggling to figure out how else to proceed with troubleshooting.

Here’s what I’m facing:

  • I have an IVA container based on DeepStream 6.2. It makes use of a custom YOLO v8 model and consumes RTSP streams.

  • The IVA is deployed via Helm onto a local K8s cluster. Single node.

  • On my dev box, which is running a 3060 Ti (plenty of RAM and a good CPU), the IVA works great. E.g. 4 concurrent streams processing at around 30 FPS.

  • The dev box is installed with native Driver Version: 525.105.17 CUDA Version: 12.0 (i.e. the driver is installed on the machine itself, not via the K8s cluster), with Cloud Native Core installed manually (v7.0); this is NOT Cloud Native Stack. OS is Ubuntu 22.04 LTS.

  • On another box, which is running an A2 (w/ even more RAM and CPU power), the same IVA deployed in the same way runs at ~13 FPS for maybe 30-45s before the FPS drops to 0.

  • The big differences here are that the A2 box runs a K8s cluster setup by an Ansible playbook for Cloud Native Stack (I’ve tried both v9.0 and v10.0); the NVIDIA driver is NOT installed locally, i.e. is entirely handled by the GPU Operator, but the driver version is the same with the exception of running CUDA 12.1. OS is also Ubuntu 22.04 LTS.

I’d figure that the A2 running the latest Cloud Native Stack would have no problems running the IVA since it runs perfectly on the 3060 Ti. I recognize that the supporting K8s cluster is a bit different, but I can’t think of (or see) a reason why the performance would differ so wildly.

Running a single stream on the A2 box works just fine, but obviously that isn’t much help.

Can anyone recommend what else I can check, or have an idea as to why this is happening? The A2 GPU looks to be in good health and the output of nvidia-smi shows the GPU is running just fine.

Thank you in advance and I apologize for the fairly vague topic.

ETA: The component latencies on the dev box running the 3060 Ti are an order of magnitude faster than what’s occurring on the A2 box. Same container.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

From DeepStream point of view, the compute stack is

  • Ubuntu 20.04
  • GStreamer 1.16.3
  • NVIDIA driver 525.85.12
  • CUDA 11.8
  • TensorRT

To find out the bottleneck of your pipeline, please try to find out the bottleneck of the pipeline if the pipeline performance does not meet the theoretical capability. The following methods may be used.

  1. To measure the latency of the DeepStream components to find out which component takes the largest latency.
  2. There is also system level performance tuning tool to profiling the process and threads level performance of the application. Developer Tools :: NVIDIA Tools Extension (NVTX) Documentation

It will be more useful to dig into the root cause of the bottleneck.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.