Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU) GPU • DeepStream Version 6.2 Triton • TensorRT Version 8.5.2 • NVIDIA GPU Driver Version (valid for GPU only) Driver Version: 525.85.05 CUDA Version: 12.0 • Issue Type( questions, new requirements, bugs) Question • How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
I am using a Deepstream Triton deployment that includes:
I am looking for a way to share tensors with zero copy mechanism as supported by Triton - to share tensors between the postprocessor and preprocessor in runtime, “Upstream” (To be clear, specifically I need to share from the postprocessor to the preprocessor, so it cannot just be an extra input from the preprocessor) . Is there any API or interface I can call for that?
I know for wure that it happens “behind the scenes” for Triton because it supports zero copy.
you might use deepstream nvinferserver to do inference, nvinferserver supports to do preprocess , inference, postprocess, you only need to modify the configuration file. about “share from the postprocessor to the preprocessor”, you might use nvinferserserver’s IInferCustomProcessor inferface, which can support “User can process last frame’s output tensor from inferenceDone() and feed into next frame’s inference input tensor in extraInputProcess()”, please refer to deepstream sample: opt\nvidia\deepstream\deepstream\sources\TritonOnnxYolo\nvdsinferserver_custom_impl_yolo\nvdsinferserver_custom_process_yolo.cpp
Hi ! It’s probably my fault here for not mentioniong, this is all on nviferserver with Triton, I need to send some tensors upstream for my use case. My model uses a history vector on dynamic batchs to identify certain events on cameras
There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks
about “I need to share from the postprocessor to the preprocessor”, do you mean you will feed last frame’s output tensor into next frame’s inference input tensor? if yes, please refer to nvinferserserver’s IInferCustomProcessor inferface mentioned in my previous comments, here is a better smaple opt\nvidia\deepstream\deepstream\sources\objectDetector_FasterRCNN\nvdsinfer_custom_impl_fasterRCNN\nvdsinferserver_custom_process.cpp, it will show how to get output and feedback to input in function inferenceDone. here is the doc: doc
do you want to use nvinferserver to do preprocess(C code, need to modify configuration file) or use triton do python preprocess? as you know, nvinferserver leverages tirton to do inference, python preprocess and postprocsss can be encapsulated into a model, here is doc: doc, here is a sample: preprocess_py
could you elaborate “share tensors with zero copy mechanism as supported by Triton”？ nvifnerserver has two modes, one is native, the other is GRPC mode, here is the doc: doc, if using GRPC mode, enable_cuda_buffer_sharing can share CUDA buffers, please refer to doc