Inference chaining using Deepstream and Triton

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) Orin AGX 64Gb
• DeepStream Version 7.1.0
• JetPack Version (valid for Jetson only) 6.2
• TensorRT Version TensorRT v101000
• Triton Inference Server NVIDIA Release 25.05 (build 170551412) - Triton Server Version 2.58.0

Hello,

I have tested a model using an AGX Orin 64gb with triton inference server and deepstream successfully.
Now, I would like to test the same model in 2x AGX Orin 32gb configuration.

For that purpose, I have split the ONNX model in half and convert each halve in TensorRT using trtexec.

I would like to know how to configure the deepstream pipeline when using a onnx model split in half. As I a using triton inference server, I am using the nvinferserverplugin.

Here are the extracts of the config.pbtxtfiles :

name: "model_part1"
platform: "tensorrt_plan"
max_batch_size: 1
default_model_filename: "model_part1.engine"
input [
  {
    name: "input"
    data_type: TYPE_FP32
    format: FORMAT_NCHW
    dims: [ 3, 720, 1280 ]
  }
]
output [
  {
    name: "/conv3_1/Conv_output_0"
    data_type: TYPE_FP32
    dims: [ 64,90,160 ]
  }
]
name: "model_part2"
platform: "tensorrt_plan"
max_batch_size: 1
default_model_filename: "model_part2.engine"
input [
  {
    name: "/conv3_1/Conv_output_0"
    data_type: TYPE_FP32
    dims: [ 64,90,160 ]
  }
]
output [
  {
    name: "output"
    data_type: TYPE_FP32
    dims: [ 5, 720, 1280 ]
  }
]

I have tried to configure the second model as a sgie with option process_mode: PROCESS_MODE_CLIP_OBJECTS, but it doesn’t seems to work.

Could you help me ?

Kind regards

  1. please refer to \opt\nvidia\deepstream\deepstream\sources\apps\sample_apps\deepstream-test2 for a pgie+sgie nvinferserver sample.
  2. if the app stll can’t work, what are two models used to do respectively? How did you know the the outputs of the first model are correct? How did you do preprocessing for the second model?

Hello,

  1. please refer to \opt\nvidia\deepstream\deepstream\sources\apps\sample_apps\deepstream-test2 for a pgie+sgie nvinferserver sample.

I tried to get some inspiration from the suggested example, but it is not the same scenario. I don’t have a pgie+sgie, I have only one pgie, but splitted in two smaller parts. In other words, I don’t have 2 different models, it is 1 model that I have split in 2 halves.

Therefore the output of the first model is the same as the input of the second model (64x90x160), but it doesn’t represent anything.

Conceptually, I would like to run the inference on the first model, get the output tensor (no postprocess) and then feed the second model with this tensor (no preprocess). I don’t need anything in between.

  1. if the app stll can’t work, what are two models used to do respectively? How did you know the the outputs of the first model are correct? How did you do preprocessing for the second model?

As I said, I don’t care about the output of the first model, I just need to get it and feed it directly to the second model withtout preprocessing.

The reason for all of this is to create a cluster of triton server and be able to run big models by splitting them into smaller ones.

Thank you

Thanks for the sharing! please refer to the sample /opt/nvidia/deepstream/deepstream/sources/TritonBackendEnsemble. The SGIE is a Triton ensemble model that has Secondary_VehicleMake, Secondary_VehicleTypes. You can use your two models as an ensemble model.

The example you are referencing could inded work if both half-models were on the same triton instance.

The idea here it to have each GPU (AGX Orin) its own triton server instance. Therefore, the ensemble model is not a possiblity in my case.

What I want to do is conceptually really simple, I want to take one model, get its output tensor and then feed another model with this tensor (simple model chaining without altering the data in between).

Deepstream as no plugin or configuration for such simple behaviour ?

How did you split the ONNX model? what is the output format? are there two ONNX models after splitting?