Performance Bottleneck on RTX 5080 with nvOCDR via Triton

2131110051 · July 19, 2025, 4:09am

I’m currently working with an RTX 5080 (Blackwell) device and have successfully set up a Triton Inference Server with the CUDA Toolkit installed. I’m using it primarily to run nvOCDR, and so far, everything works smoothly for both image and video inputs when tested manually.

However, when running video inference directly through Triton, I’ve noticed a significant performance bottleneck—particularly due to limited batch size handling in the current nvOCDR setup. From what I’ve observed, this greatly impacts real-time performance.

After reading through previous discussions, I came across a suggestion that:

I’d like to know:

Has anyone here deployed DeepStream with Triton backend on RTX 5080 or similar (5090)?
Can this integration overcome the batch size limitation observed with standalone Triton + nvOCDR?
Are there sample pipelines or configuration templates for DeepStream + Triton specifically optimized for Blackwell GPUs?

Levi_Pereira · July 19, 2025, 9:30pm

Hi,

To deploy Triton Server, you can check the configuration details here: OCRNET parse function for DeepStream - #9 by Levi_Pereira

I have this repo that’s already ready for Triton Server: GitHub - levipereira/deepstream_ocr

I didn’t use OCDNet in this example - you’ll need to implement the text detection model in Triton as well if you want full text processing. In my example, I just deploy OCRNet as primary inference, but if you want to detect text, you can use OCDNet as primary inference and OCRNet as secondary inference.

This repo shows an example of how to create a Triton Server: GitHub - levipereira/triton-server-yolo: This repository serves as an example of deploying the YOLO models on Triton Server for performance and testing purposes

For documentation Deploying your trained model using Triton — NVIDIA Triton Inference Server

Hope this helps!

2131110051 · July 20, 2025, 6:53am

Hi Pereira,

Thank you for your helpful advice
You mentioned that “DeepStream can work with Blackwell GPUs when using the Triton Inference Server as a backend instead of the native inference engine.” That caught my attention, and I’d like to understand how to make that work.

Currently, I have successfully deployed Triton Inference Server running both OCDNet and OCRNet models. However, I want to integrate this setup into DeepStream. The challenge is that I’m using a Blackwell GPU (RTX 5080), which is not supported by the default DeepStream inference engine.

Levi_Pereira · July 20, 2025, 5:21pm

If you’ve already configured the Triton Server, then you must use gst-nvinferserver instead of gst-nvinfer.

You can reference this nvinferserver configuration file as an example: deepstream_ocr/pgie-id-1-ocrnet.txt at master · levipereira/deepstream_ocr · GitHub
deepstream_ocr/pipeline.py at 818d2b5023311d22cdc966fb352f3352339e57f9 · levipereira/deepstream_ocr · GitHub
Important considerations:

Use separate containers: You’ll need to deploy two separate containers since they require different TRT versions and dependencies.
Avoid the bundled Triton Server: Don’t use the triton-server that comes with the DeepStream 7.1 container, as it’s not compatible for this use case.
Deploy the correct Triton Server version: You must deploy the standalone Triton Server container version 25.02 or higher for proper functionality.

2131110051 · July 21, 2025, 2:27pm

Thank you Pereira for the detailed explanation.

I’ve already built and successfully deployed Triton Inference Server (v25.01/25.02) with two optimized TensorRT models: OCDNet and OCRNet. So basically, I probably just need to build DeepStream 7.1 inside a container, but I’m still unsure because most of the specifications in the documentation don’t match my PC.

Now, I’d like to clarify a few things regarding the DeepStream 7.1 container setup:

Since I only have one GPU (RTX 5080) 16GB, which is currently dedicated to running the Triton Server, is it still possible to use the same GPU for the DeepStream 7.1 container as well?I understand that DeepStream requires CUDA memory sharing and GPU access even if inference is performed remotely via gst-nvinferserver. My concern is whether both DeepStream and Triton can share the same GPU without conflict, or if a separate GPU is mandatory.
Could you provide the correct installation method or Docker image for DeepStream 7.1 Container that supports RTX 5080?Based on the official documentation, the required environment for RTX GPUs includes:

Ubuntu 22.04
GStreamer 1.20.3
NVIDIA driver 560.35.03 (for RTX GPUs)
CUDA 12.6
TensorRT 10.3.0.26

And my PC right now its like :

Ubuntu 24.04
NVIDIA Driver 570.135.. (RTX 5080)
CUDA 12.8
10.9.30

Levi_Pereira · July 21, 2025, 2:36pm

Q1: Can share the same GPU
Q2: DeepStream can be deployed the same way as on any other GPU.
The only restriction is not use gst-nvinfer but use gst-nvinferserver.

docker run \
        -it \
        --net=host \
        --ipc=host \
        --gpus all \
        nvcr.io/nvidia/deepstream:7.1-triton-multiarch

Levi_Pereira · July 21, 2025, 2:46pm

And my PC right now its like :

Ubuntu 24.04

NVIDIA Driver 570.135.. (RTX 5080)

CUDA 12.8

10.9.30

I don’t recommend installing anything directly on the server. Install everything via Docker to keep software components separated.

On the server, install only:

NVIDIA Driver
NVIDIA Container Toolkit
Docker

Then use Docker to deploy the environment. This approach maintains better isolation and makes the deployment more manageable

triton-server

docker run --gpus all \
 --name triton-server-25.02 \
 -it \
 --ipc=host \
 --shm-size=2g \
 --ulimit memlock=-1 \
 --ulimit stack=67108864 \
 -p8000:8000 \
 -p8001:8001 \
 -p8002:8002 \
 nvcr.io/nvidia/tritonserver:25.02-py3

2131110051 · July 21, 2025, 2:52pm

Thank you so much!
I’ll run the DeepStream Docker container as per your guidance.
By the way, I’m already running Triton Inference using a separate Docker container, so I believe this setup is safe and won’t interfere with other configurations.

Also my Triton Server is running version 25.01. I’ll try testing the integration between DeepStream and Triton using this version first.
If it doesn’t work properly, I’ll rebuild Triton Server using version 25.02 as recommended.

Topic		Replies	Views
DeepStream 7.1 + Triton Inference (RTX 5090 Blackwell dGPU) — Low FPS due to CPU fallback for output tensors DeepStream SDK deepstream	4	105	August 20, 2025
Inquiry About DeepStream Support for RTX 50 Series (5080/5090 GPUs) DeepStream SDK deepstream	1	75	August 19, 2025
First steps, any help is greatly appreciated DeepStream SDK	2	219	October 8, 2023
Does deepstream:5.0.1-20.09-triton support optimized models with TensorRT 7.2.2? DeepStream SDK inference-server-triton	2	867	October 12, 2021
Deepstream with standalone triton server DeepStream SDK	4	916	November 1, 2021
Deepstream Triton server and trafficcamnet DeepStream SDK	2	320	February 2, 2023
How to add triton server to deepstream in different device? DeepStream SDK	10	843	September 7, 2023
DeepStream Triton gRPC example does not run with Deepstream Triton Docker images DeepStream SDK	12	1233	January 17, 2023
Deepstream and Triton containers DeepStream SDK deepstream	5	41	September 30, 2024
Deepstream on RTX 3000 series cards DeepStream SDK	3	1111	October 12, 2021

Performance Bottleneck on RTX 5080 with nvOCDR via Triton

Related topics