Please provide complete information as applicable to your setup.
• Hardware Platform (GPU) A4000
• DeepStream Version 6.3 (from nvcr.io/nvidia/deepstream:6.3-gc-triton-devel container)
• NVIDIA GPU Driver Version (valid for GPU only) 535.161.07
I am running my model successfully, but I want to optimize memory transfer and for that reason my goal is to transfer input and output tensors to/from deepstream to/from triton via CUDA Shared Memory
I have setup the triton server to use GRPC, I have set enable_cuda_buffer_sharing:true
I have also set output_mem_type: MEMORY_TYPE_GPU
However, when I run the pipeline Triton reports
GRPC: unable to provide '<output tensor name>' in GPU, will use CPU
I thought maybe I did something wrong in the configuration, so I ran one of the example pipelines which uses GRPC and shared memory theoretically
deepstream-app -c /opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app-triton-grpc/source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt -i /opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4
But the result I got was the same: Triton reports the same error:
I0328 10:48:07.319988 3668929 tensorrt.cc:334] model Primary_Detector, instance Primary_Detector_0, executing 1 requests
I0328 10:48:07.320045 3668929 instance_state.cc:360] TRITONBACKEND_ModelExecute: Issuing Primary_Detector_0 with 1 requests
I0328 10:48:07.320058 3668929 instance_state.cc:409] TRITONBACKEND_ModelExecute: Running Primary_Detector_0 with 1 requests
I0328 10:48:07.320100 3668929 instance_state.cc:1437] Optimization profile default [0] is selected for Primary_Detector_0
I0328 10:48:07.320176 3668929 instance_state.cc:900] Context with profile default [0] is being executed for Primary_Detector_0
I0328 10:48:07.320722 3668929 infer_response.cc:167] add response output: output: conv2d_bbox, type: FP32, shape: [4,16,23,40]
I0328 10:48:07.320761 3668929 grpc_server.cc:2916] GRPC: unable to provide 'conv2d_bbox' in GPU, will use CPU
I0328 10:48:07.320852 3668929 grpc_server.cc:2927] GRPC: using buffer for 'conv2d_bbox', size: 235520, addr: 0x77612c277900
I0328 10:48:07.320933 3668929 pinned_memory_manager.cc:161] pinned memory allocation: size 235520, addr 0x7763b6000090
I0328 10:48:07.321199 3668929 infer_response.cc:167] add response output: output: conv2d_cov/Sigmoid, type: FP32, shape: [4,4,23,40]
I0328 10:48:07.321223 3668929 grpc_server.cc:2916] GRPC: unable to provide 'conv2d_cov/Sigmoid' in GPU, will use CPU
I0328 10:48:07.321359 3668929 grpc_server.cc:2927] GRPC: using buffer for 'conv2d_cov/Sigmoid', size: 58880, addr: 0x77612c2b1110
I0328 10:48:07.321378 3668929 pinned_memory_manager.cc:161] pinned memory allocation: size 58880, addr 0x7763b60398a0
I0328 10:48:07.321536 3668929 grpc_server.cc:4123] ModelInferHandler::InferResponseComplete, 0 step ISSUED
I0328 10:48:07.321573 3668929 grpc_server.cc:3047] GRPC free: size 235520, addr 0x77612c277900
I0328 10:48:07.321585 3668929 grpc_server.cc:3047] GRPC free: size 58880, addr 0x77612c2b1110
I0328 10:48:07.321876 3668929 grpc_server.cc:3677] ModelInferHandler::InferRequestComplete
I0328 10:48:07.321891 3668929 grpc_server.cc:3959] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE
I0328 10:48:07.321904 3668929 grpc_server.cc:2837] Done for ModelInferHandler, 0
What am I doing wrong? Could someone help me out?