Deepstream Python Triton model - share tensors with zero copy "upstream"

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) GPU
• DeepStream Version 6.2 Triton
• TensorRT Version 8.5.2
• NVIDIA GPU Driver Version (valid for GPU only) Driver Version: 525.85.05 CUDA Version: 12.0
• Issue Type( questions, new requirements, bugs) Question
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

I am using a Deepstream Triton deployment that includes:

  1. Python preprocessor
  2. TensorRT Model
  3. Python postprocessor

I am looking for a way to share tensors with zero copy mechanism as supported by Triton - to share tensors between the postprocessor and preprocessor in runtime, “Upstream” (To be clear, specifically I need to share from the postprocessor to the preprocessor, so it cannot just be an extra input from the preprocessor) . Is there any API or interface I can call for that?
I know for wure that it happens “behind the scenes” for Triton because it supports zero copy.

Thanks again


  1. what is the model used to do?
  2. you might use deepstream nvinferserver to do inference, nvinferserver supports to do preprocess , inference, postprocess, you only need to modify the configuration file. about “share from the postprocessor to the preprocessor”, you might use nvinferserserver’s IInferCustomProcessor inferface, which can support “User can process last frame’s output tensor from inferenceDone() and feed into next frame’s inference input tensor in extraInputProcess()”, please refer to deepstream sample: opt\nvidia\deepstream\deepstream\sources\TritonOnnxYolo\nvdsinferserver_custom_impl_yolo\nvdsinferserver_custom_process_yolo.cpp

Hi ! It’s probably my fault here for not mentioniong, this is all on nviferserver with Triton, I need to send some tensors upstream for my use case. My model uses a history vector on dynamic batchs to identify certain events on cameras

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

  1. about “I need to share from the postprocessor to the preprocessor”, do you mean you will feed last frame’s output tensor into next frame’s inference input tensor? if yes, please refer to nvinferserserver’s IInferCustomProcessor inferface mentioned in my previous comments, here is a better smaple opt\nvidia\deepstream\deepstream\sources\objectDetector_FasterRCNN\nvdsinfer_custom_impl_fasterRCNN\nvdsinferserver_custom_process.cpp, it will show how to get output and feedback to input in function inferenceDone. here is the doc: doc
  2. do you want to use nvinferserver to do preprocess(C code, need to modify configuration file) or use triton do python preprocess? as you know, nvinferserver leverages tirton to do inference, python preprocess and postprocsss can be encapsulated into a model, here is doc: doc, here is a sample: preprocess_py
  3. could you elaborate “share tensors with zero copy mechanism as supported by Triton”? nvifnerserver has two modes, one is native, the other is GRPC mode, here is the doc: doc, if using GRPC mode, enable_cuda_buffer_sharing can share CUDA buffers, please refer to doc

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.