Avoid memory copy for deepstream pipeline connecting to a standalone local triton inference server

alexander.soklev · March 28, 2024, 11:02am

Please provide complete information as applicable to your setup.

• Hardware Platform (GPU) A4000
• DeepStream Version 6.3 (from nvcr.io/nvidia/deepstream:6.3-gc-triton-devel container)
• NVIDIA GPU Driver Version (valid for GPU only) 535.161.07

I am running my model successfully, but I want to optimize memory transfer and for that reason my goal is to transfer input and output tensors to/from deepstream to/from triton via CUDA Shared Memory

I have setup the triton server to use GRPC, I have set enable_cuda_buffer_sharing:true
I have also set output_mem_type: MEMORY_TYPE_GPU

However, when I run the pipeline Triton reports
GRPC: unable to provide '<output tensor name>' in GPU, will use CPU

I thought maybe I did something wrong in the configuration, so I ran one of the example pipelines which uses GRPC and shared memory theoretically

deepstream-app -c /opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app-triton-grpc/source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt -i /opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4

But the result I got was the same: Triton reports the same error:

I0328 10:48:07.319988 3668929 tensorrt.cc:334] model Primary_Detector, instance Primary_Detector_0, executing 1 requests
I0328 10:48:07.320045 3668929 instance_state.cc:360] TRITONBACKEND_ModelExecute: Issuing Primary_Detector_0 with 1 requests
I0328 10:48:07.320058 3668929 instance_state.cc:409] TRITONBACKEND_ModelExecute: Running Primary_Detector_0 with 1 requests
I0328 10:48:07.320100 3668929 instance_state.cc:1437] Optimization profile default [0] is selected for Primary_Detector_0
I0328 10:48:07.320176 3668929 instance_state.cc:900] Context with profile default [0] is being executed for Primary_Detector_0
I0328 10:48:07.320722 3668929 infer_response.cc:167] add response output: output: conv2d_bbox, type: FP32, shape: [4,16,23,40]
I0328 10:48:07.320761 3668929 grpc_server.cc:2916] GRPC: unable to provide 'conv2d_bbox' in GPU, will use CPU
I0328 10:48:07.320852 3668929 grpc_server.cc:2927] GRPC: using buffer for 'conv2d_bbox', size: 235520, addr: 0x77612c277900
I0328 10:48:07.320933 3668929 pinned_memory_manager.cc:161] pinned memory allocation: size 235520, addr 0x7763b6000090
I0328 10:48:07.321199 3668929 infer_response.cc:167] add response output: output: conv2d_cov/Sigmoid, type: FP32, shape: [4,4,23,40]
I0328 10:48:07.321223 3668929 grpc_server.cc:2916] GRPC: unable to provide 'conv2d_cov/Sigmoid' in GPU, will use CPU
I0328 10:48:07.321359 3668929 grpc_server.cc:2927] GRPC: using buffer for 'conv2d_cov/Sigmoid', size: 58880, addr: 0x77612c2b1110
I0328 10:48:07.321378 3668929 pinned_memory_manager.cc:161] pinned memory allocation: size 58880, addr 0x7763b60398a0
I0328 10:48:07.321536 3668929 grpc_server.cc:4123] ModelInferHandler::InferResponseComplete, 0 step ISSUED
I0328 10:48:07.321573 3668929 grpc_server.cc:3047] GRPC free: size 235520, addr 0x77612c277900
I0328 10:48:07.321585 3668929 grpc_server.cc:3047] GRPC free: size 58880, addr 0x77612c2b1110
I0328 10:48:07.321876 3668929 grpc_server.cc:3677] ModelInferHandler::InferRequestComplete
I0328 10:48:07.321891 3668929 grpc_server.cc:3959] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE
I0328 10:48:07.321904 3668929 grpc_server.cc:2837] Done for ModelInferHandler, 0

What am I doing wrong? Could someone help me out?

fanzh · April 1, 2024, 3:10am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

triton is opensource. this log is from the old version of triton server module. in the latter version, there is no this error.

system · April 15, 2024, 7:01am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
DeepStream Triton gRPC example does not run with Deepstream Triton Docker images DeepStream SDK	12	1156	January 17, 2023
Triton infererence server example 'simple_grpc_infer_client.py' DeepStream SDK	11	5021	March 23, 2022
Problem with accumulating gpu memory usage in tritonserver TensorRT cudnn , inference-server-triton , deepstream	0	117	September 3, 2024
[error] when DeepsTream`s container using Triton Inference Server through gRPC,Segmentation fault (core dumped) DeepStream SDK	11	1076	March 9, 2022
Wrong model versioning when using gRPC DeepStream SDK python , grpc , deepstream , deepstream61	9	823	August 16, 2022
Deepstream with triton DeepStream SDK	12	554	October 9, 2023
No Detection for Centerface Model Inference with Deepstream(for Centerface) DeepStream SDK tensorrt , camera , inference-server-triton	17	1931	October 12, 2021
Deepstream parallell inference failing to get probe data from nvdsmetamux DeepStream SDK ubuntu , gstreamer , deepstream	8	78	October 8, 2024
CUDA shared memory registration failed when requesting recognition from deepstream to an external triton server. to occur DeepStream SDK	6	449	April 23, 2024
Deepstream Triton Docker container cannot run with MPS DeepStream SDK	4	950	January 17, 2023

Avoid memory copy for deepstream pipeline connecting to a standalone local triton inference server

Related topics