Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU) Jetson • DeepStream Version 7.1` • JetPack Version (valid for Jetson only) 6.2 • TensorRT Version 10.3.0.26 • NVIDIA GPU Driver Version (valid for GPU only) • Issue Type( questions, new requirements, bugs) Question • How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
I’m working with a DeepStream 7.1 pipeline on Jetson Orin NX where I use nvinferserver for inference through Triton Inference Server. I would like to enable zero-copy to avoid unnecessary memory transfers and improve performance.
Could you please guide me on how to enable/configure zero-copy in this setup?
Is it supported on Jetson with nvinferserver?
Are there specific DeepStream or Triton configurations I need to set?
are the client and server on the same Jetson device? AKY, In GPRC mode, input tensors and inference results need to be passed by GRPC protocol, hence memory copy and transfer are unavoidable.
If on dgpu, configuration enable_cuda_buffer_sharing can be used to pass input tensors with CUDA buffer sharing.
Does that mean using GRPC zero copy is not possible even if the client and server are on the same Jetson device ?
Also if I use CAPI is it possible to do zero copy on Jetson? Based on the deepstream 7.1 nvinferserver properties it says enable_cuda_buffer_sharing is not supported on jetson devices
And if I do wanted to use zero copy on jetson devices what configuration would you suggest.
Yes, Currently using GRPC zero copy is not possible. nvinferserver plugin and low-level opensource are opensource.
If using CAPI, there is no extra memory copy. the preprocessed GPU input tensors will be passed to triton API. ninfersever also supports output GPU memory by “output_mem_type”.
If you concern about performance, you may use CAPI mode. If you want to do custom popostprocessng to the inference GPU data, Please refer to the sample deepstream-infer-tensor-meta-test.
@hatake_kakashi
are the client and server on the same Jetson device? You may nvinferserver grpc mode to achieve the point 1 and 2 you mentioned.
regarding zero-copy, what processing do you mean? If you want to keep the DeepStream and the Triton separate, you have to use GRPC mode. As written above, In GPRC mode, Currently input tensors and inference results need to be passed by GRPC protocol, hence memory copy and transfer are unavoidable. Why not use use CAPI mode? why do you need to use a standalone triton docker? Thanks!