Enabling Zero-Copy Between DeepStream and Triton on Jetson Orin

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) Jetson
• DeepStream Version 7.1`
• JetPack Version (valid for Jetson only) 6.2
• TensorRT Version 10.3.0.26
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs) Question
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

I’m working with a DeepStream 7.1 pipeline on Jetson Orin NX where I use nvinferserver for inference through Triton Inference Server. I would like to enable zero-copy to avoid unnecessary memory transfers and improve performance.

Could you please guide me on how to enable/configure zero-copy in this setup?

  • Is it supported on Jetson with nvinferserver?

  • Are there specific DeepStream or Triton configurations I need to set?

Thanks in advance!

nvinferserver supports CAPI and GRPC mode. which mode do you want to use?

I’ll be using GRPC

are the client and server on the same Jetson device? AKY, In GPRC mode, input tensors and inference results need to be passed by GRPC protocol, hence memory copy and transfer are unavoidable.
If on dgpu, configuration enable_cuda_buffer_sharing can be used to pass input tensors with CUDA buffer sharing.

Does that mean using GRPC zero copy is not possible even if the client and server are on the same Jetson device ?

Also if I use CAPI is it possible to do zero copy on Jetson? Based on the deepstream 7.1 nvinferserver properties it says enable_cuda_buffer_sharing is not supported on jetson devices

And if I do wanted to use zero copy on jetson devices what configuration would you suggest.

  1. Yes, Currently using GRPC zero copy is not possible. nvinferserver plugin and low-level opensource are opensource.
  2. If using CAPI, there is no extra memory copy. the preprocessed GPU input tensors will be passed to triton API. ninfersever also supports output GPU memory by “output_mem_type”.
  3. If you concern about performance, you may use CAPI mode. If you want to do custom popostprocessng to the inference GPU data, Please refer to the sample deepstream-infer-tensor-meta-test.

Is possible to do zero copy using HTTP protocol on jetson?

DeepStream nvinferserver plugin only supports CAPI and GRPC mode.

Is it possible to achieve zero-copy on Jetson with two different Docker containers?

  1. DeepStream container runs the video analytics pipeline
  2. Triton server container runs the model

Is above configuration possible? Or we must use the “CAPI” meaning linking .so files with the DeepStream application?

Our fundamental goal is to keep the DeepStream and the Triton separate.

@hatake_kakashi
are the client and server on the same Jetson device? You may nvinferserver grpc mode to achieve the point 1 and 2 you mentioned.
regarding zero-copy, what processing do you mean? If you want to keep the DeepStream and the Triton separate, you have to use GRPC mode. As written above, In GPRC mode, Currently input tensors and inference results need to be passed by GRPC protocol, hence memory copy and transfer are unavoidable. Why not use use CAPI mode? why do you need to use a standalone triton docker? Thanks!

are the client and server on the same Jetson device?

Yes, they’re both on the same Jetson device.

You may nvinferserver grpc mode to achieve the point 1 and 2 you mentioned

Earlier, we were under the impression that zero-copy will work with gRPC, but as you pointed out it seems that is not possible.

We have the following pipeline:

  • Our DeepStream pipeline is supposed to read the input frames
  • Pass it to Triton and the Triton detects the objects. Triton passes the objects back to the DeepStream
  • DeepStream takes it from there and continues with rest of the logic

Two most important priorities are:

  • It must be zero-copy
  • Both Triton and DeepStream need to be separate processes

If you want to keep the DeepStream and the Triton separate, you have to use GRPC mode

But since it doesn’t support zero-copy then it may not be an option for us.

Why not use use CAPI mode?

We are not strictly against this, just want to keep the architecture simple. If this is the only way to achieve zero-copy then we will do it.

why do you need to use a standalone triton docker?

The reason behind this is we want to keep the pipeline (DeepStream) separate from the models (Triton) so that it is easier to maintain for us.

Could you tell me if that is possible? Or, the only way to do it is using the CAPI and .so file integration?

Hi @fanzh awaiting response. Thanks!

In GPRC mode, Currently input tensors and inference results need to be passed by GRPC protocol. Please use CAPI mode for a higher performance.