Avoid memory copy for deepstream pipeline connecting to a standalone local triton inference server

Please provide complete information as applicable to your setup.

• Hardware Platform (GPU) A4000
• DeepStream Version 6.3 (from nvcr.io/nvidia/deepstream:6.3-gc-triton-devel container)
• NVIDIA GPU Driver Version (valid for GPU only) 535.161.07

I am running my model successfully, but I want to optimize memory transfer and for that reason my goal is to transfer input and output tensors to/from deepstream to/from triton via CUDA Shared Memory

I have setup the triton server to use GRPC, I have set enable_cuda_buffer_sharing:true
I have also set output_mem_type: MEMORY_TYPE_GPU

However, when I run the pipeline Triton reports
GRPC: unable to provide '<output tensor name>' in GPU, will use CPU

I thought maybe I did something wrong in the configuration, so I ran one of the example pipelines which uses GRPC and shared memory theoretically

deepstream-app -c /opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app-triton-grpc/source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt -i /opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4

But the result I got was the same: Triton reports the same error:

I0328 10:48:07.319988 3668929 tensorrt.cc:334] model Primary_Detector, instance Primary_Detector_0, executing 1 requests
I0328 10:48:07.320045 3668929 instance_state.cc:360] TRITONBACKEND_ModelExecute: Issuing Primary_Detector_0 with 1 requests
I0328 10:48:07.320058 3668929 instance_state.cc:409] TRITONBACKEND_ModelExecute: Running Primary_Detector_0 with 1 requests
I0328 10:48:07.320100 3668929 instance_state.cc:1437] Optimization profile default [0] is selected for Primary_Detector_0
I0328 10:48:07.320176 3668929 instance_state.cc:900] Context with profile default [0] is being executed for Primary_Detector_0
I0328 10:48:07.320722 3668929 infer_response.cc:167] add response output: output: conv2d_bbox, type: FP32, shape: [4,16,23,40]
I0328 10:48:07.320761 3668929 grpc_server.cc:2916] GRPC: unable to provide 'conv2d_bbox' in GPU, will use CPU
I0328 10:48:07.320852 3668929 grpc_server.cc:2927] GRPC: using buffer for 'conv2d_bbox', size: 235520, addr: 0x77612c277900
I0328 10:48:07.320933 3668929 pinned_memory_manager.cc:161] pinned memory allocation: size 235520, addr 0x7763b6000090
I0328 10:48:07.321199 3668929 infer_response.cc:167] add response output: output: conv2d_cov/Sigmoid, type: FP32, shape: [4,4,23,40]
I0328 10:48:07.321223 3668929 grpc_server.cc:2916] GRPC: unable to provide 'conv2d_cov/Sigmoid' in GPU, will use CPU
I0328 10:48:07.321359 3668929 grpc_server.cc:2927] GRPC: using buffer for 'conv2d_cov/Sigmoid', size: 58880, addr: 0x77612c2b1110
I0328 10:48:07.321378 3668929 pinned_memory_manager.cc:161] pinned memory allocation: size 58880, addr 0x7763b60398a0
I0328 10:48:07.321536 3668929 grpc_server.cc:4123] ModelInferHandler::InferResponseComplete, 0 step ISSUED
I0328 10:48:07.321573 3668929 grpc_server.cc:3047] GRPC free: size 235520, addr 0x77612c277900
I0328 10:48:07.321585 3668929 grpc_server.cc:3047] GRPC free: size 58880, addr 0x77612c2b1110
I0328 10:48:07.321876 3668929 grpc_server.cc:3677] ModelInferHandler::InferRequestComplete
I0328 10:48:07.321891 3668929 grpc_server.cc:3959] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE
I0328 10:48:07.321904 3668929 grpc_server.cc:2837] Done for ModelInferHandler, 0

What am I doing wrong? Could someone help me out?

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

triton is opensource. this log is from the old version of triton server module. in the latter version, there is no this error.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.