Performance Bottleneck on RTX 5080 with nvOCDR via Triton

I’m currently working with an RTX 5080 (Blackwell) device and have successfully set up a Triton Inference Server with the CUDA Toolkit installed. I’m using it primarily to run nvOCDR, and so far, everything works smoothly for both image and video inputs when tested manually.

However, when running video inference directly through Triton, I’ve noticed a significant performance bottleneck—particularly due to limited batch size handling in the current nvOCDR setup. From what I’ve observed, this greatly impacts real-time performance.

After reading through previous discussions, I came across a suggestion that:

I’d like to know:

  1. Has anyone here deployed DeepStream with Triton backend on RTX 5080 or similar (5090)?
  2. Can this integration overcome the batch size limitation observed with standalone Triton + nvOCDR?
  3. Are there sample pipelines or configuration templates for DeepStream + Triton specifically optimized for Blackwell GPUs?

Hi,

To deploy Triton Server, you can check the configuration details here: OCRNET parse function for DeepStream - #9 by Levi_Pereira

I have this repo that’s already ready for Triton Server: GitHub - levipereira/deepstream_ocr

I didn’t use OCDNet in this example - you’ll need to implement the text detection model in Triton as well if you want full text processing. In my example, I just deploy OCRNet as primary inference, but if you want to detect text, you can use OCDNet as primary inference and OCRNet as secondary inference.

This repo shows an example of how to create a Triton Server: GitHub - levipereira/triton-server-yolo: This repository serves as an example of deploying the YOLO models on Triton Server for performance and testing purposes

For documentation Deploying your trained model using Triton — NVIDIA Triton Inference Server

Hope this helps!

Hi Pereira,

Thank you for your helpful advice
You mentioned that “DeepStream can work with Blackwell GPUs when using the Triton Inference Server as a backend instead of the native inference engine.” That caught my attention, and I’d like to understand how to make that work.

Currently, I have successfully deployed Triton Inference Server running both OCDNet and OCRNet models. However, I want to integrate this setup into DeepStream. The challenge is that I’m using a Blackwell GPU (RTX 5080), which is not supported by the default DeepStream inference engine.

If you’ve already configured the Triton Server, then you must use gst-nvinferserver instead of gst-nvinfer.

You can reference this nvinferserver configuration file as an example: deepstream_ocr/pgie-id-1-ocrnet.txt at master · levipereira/deepstream_ocr · GitHub
deepstream_ocr/pipeline.py at 818d2b5023311d22cdc966fb352f3352339e57f9 · levipereira/deepstream_ocr · GitHub
Important considerations:

  1. Use separate containers: You’ll need to deploy two separate containers since they require different TRT versions and dependencies.
  2. Avoid the bundled Triton Server: Don’t use the triton-server that comes with the DeepStream 7.1 container, as it’s not compatible for this use case.
  3. Deploy the correct Triton Server version: You must deploy the standalone Triton Server container version 25.02 or higher for proper functionality.

Thank you Pereira for the detailed explanation.

I’ve already built and successfully deployed Triton Inference Server (v25.01/25.02) with two optimized TensorRT models: OCDNet and OCRNet. So basically, I probably just need to build DeepStream 7.1 inside a container, but I’m still unsure because most of the specifications in the documentation don’t match my PC.

Now, I’d like to clarify a few things regarding the DeepStream 7.1 container setup:

  1. Since I only have one GPU (RTX 5080) 16GB, which is currently dedicated to running the Triton Server, is it still possible to use the same GPU for the DeepStream 7.1 container as well?I understand that DeepStream requires CUDA memory sharing and GPU access even if inference is performed remotely via gst-nvinferserver. My concern is whether both DeepStream and Triton can share the same GPU without conflict, or if a separate GPU is mandatory.

  2. Could you provide the correct installation method or Docker image for DeepStream 7.1 Container that supports RTX 5080?Based on the official documentation, the required environment for RTX GPUs includes:

  • Ubuntu 22.04
  • GStreamer 1.20.3
  • NVIDIA driver 560.35.03 (for RTX GPUs)
  • CUDA 12.6
  • TensorRT 10.3.0.26

And my PC right now its like :

  • Ubuntu 24.04
  • NVIDIA Driver 570.135.. (RTX 5080)
  • CUDA 12.8
  • 10.9.30

Q1: Can share the same GPU
Q2: DeepStream can be deployed the same way as on any other GPU.
The only restriction is not use gst-nvinfer but use gst-nvinferserver.

docker run \
        -it \
        --net=host \
        --ipc=host \
        --gpus all \
        nvcr.io/nvidia/deepstream:7.1-triton-multiarch

And my PC right now its like :

  • Ubuntu 24.04
  • NVIDIA Driver 570.135.. (RTX 5080)
  • CUDA 12.8
  • 10.9.30

I don’t recommend installing anything directly on the server. Install everything via Docker to keep software components separated.

On the server, install only:

  • NVIDIA Driver
  • NVIDIA Container Toolkit
  • Docker

Then use Docker to deploy the environment. This approach maintains better isolation and makes the deployment more manageable

triton-server

docker run --gpus all \
 --name triton-server-25.02 \
 -it \
 --ipc=host \
 --shm-size=2g \
 --ulimit memlock=-1 \
 --ulimit stack=67108864 \
 -p8000:8000 \
 -p8001:8001 \
 -p8002:8002 \
 nvcr.io/nvidia/tritonserver:25.02-py3

Thank you so much!
I’ll run the DeepStream Docker container as per your guidance.
By the way, I’m already running Triton Inference using a separate Docker container, so I believe this setup is safe and won’t interfere with other configurations.

Also my Triton Server is running version 25.01. I’ll try testing the integration between DeepStream and Triton using this version first.
If it doesn’t work properly, I’ll rebuild Triton Server using version 25.02 as recommended.