DeepStream Triton gRPC example does not run with Deepstream Triton Docker images

feicccccccc · January 10, 2023, 6:09am

Hi all,

I am trying to use deepstream and Triton inference servers in different computers/Nodes. The plan is to use a dedicated computer to handle inference and manage models, and multiple computers to handle multiple streams. I am able to open 2 containers on the same computer, and successfully run the example. But when I run it in 2 computers, I get

ERROR: infer_grpc_client.cpp:223 Failed to register CUDA shared memory.

Can you advice how to connect deepstream and triton inference server through grpc, as well as how to go through the tutorial?.

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
Computer 1: RTX3090
Computer 2: RTX3090
• DeepStream Version
both computer: deepstream:6.1.1-triton
• NVIDIA GPU Driver Version (valid for GPU only)
515.86.01
• Issue Type( questions, new requirements, bugs)
bugs
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
Computer 1 (Server):

Run the docker image

docker run --gpus all -it --rm -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=$DISPLAY --net=host --name=triton-server nvcr.io/nvidia/deepstream:6.1.1-triton

Inside the container

cd samples
./prepare_ds_triton_model_repo.sh
tritonserver --model-repository triton_model_repo/

Verify the application can communicate through grpc in different container

docker run --gpus all -it --rm -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=$DISPLAY --net=host --name=triton-client nvcr.io/nvidia/deepstream:6.1.1-triton
cd /opt/nvidia/deepstream/deepstream-6.1/samples/configs/deepstream-app-triton-grpc
deepstream-app -c source30_1080p_dec_infer-resnet_tiled_display_int8.txt

Computer 2 (Client):

Run the docker image

docker run --gpus all -it --rm -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=$DISPLAY --net=host --name=triton-client nvcr.io/nvidia/deepstream:6.1.1-triton
Inside the container
Change the grpc url to the server ip

cd /opt/nvidia/deepstream/deepstream-6.1/samples/configs/deepstream-app-triton-grpc
vim config_infer_plan_engine_primary.txt`

grpc {
url: “192.168.51.13:8001”
# url: “localhost:8001”
enable_cuda_buffer_sharing: true
}
deepstream-app -c source30_1080p_dec_infer-resnet_tiled_display_int8.txt

Error Messages:

WARNING: infer_proto_utils.cpp:144 auto-update preprocess.network_format to IMAGE_FORMAT_RGB
INFO: infer_grpc_backend.cpp:169 TritonGrpcBackend id:1 initialized for model: Primary_Detector
ERROR: infer_grpc_client.cpp:223 Failed to register CUDA shared memory.
ERROR: infer_grpc_client.cpp:311 Failed to set inference input: failed to register CUDA shared memory region ‘inbuf_0x55558a0e0800’: failed to open CUDA IPC handle: invalid resource handle
ERROR: infer_grpc_backend.cpp:140 gRPC backend run failed to create request for model: Primary_Detector
ERROR: infer_trtis_backend.cpp:350 failed to specify dims when running inference on model:Primary_Detector, nvinfer error:NVDSINFER_TRITON_ERROR
0:00:00.127168214 1531 0x55558ab10920 ERROR nvinferserver gstnvinferserver.cpp:375:gst_nvinfer_server_logger:<primary_gie> nvinferserver[UID 1]: Error in specifyBackendDims() <infer_grpc_context.cpp:154> [UID = 1]: failed to specify input dims triton backend for model:Primary_Detector, nvinfer error:NVDSINFER_TRITON_ERROR
0:00:00.127178191 1531 0x55558ab10920 ERROR nvinferserver gstnvinferserver.cpp:375:gst_nvinfer_server_logger:<primary_gie> nvinferserver[UID 1]: Error in createNNBackend() <infer_grpc_context.cpp:210> [UID = 1]: failed to specify triton backend input dims for model:Primary_Detector, nvinfer error:NVDSINFER_TRITON_ERROR
0:00:00.127188907 1531 0x55558ab10920 ERROR nvinferserver gstnvinferserver.cpp:375:gst_nvinfer_server_logger:<primary_gie> nvinferserver[UID 1]: Error in initialize() <infer_base_context.cpp:79> [UID = 1]: create nn-backend failed, check config file settings, nvinfer error:NVDSINFER_TRITON_ERROR
0:00:00.127192287 1531 0x55558ab10920 WARN nvinferserver gstnvinferserver_impl.cpp:547:start:<primary_gie> error: Failed to initialize InferTrtIsContext
0:00:00.127194044 1531 0x55558ab10920 WARN nvinferserver gstnvinferserver_impl.cpp:547:start:<primary_gie> error: Config file path: /opt/nvidia/deepstream/deepstream-6.1/samples/configs/deepstream-app-triton-grpc/config_infer_plan_engine_primary.txt
0:00:00.127210681 1531 0x55558ab10920 WARN nvinferserver gstnvinferserver.cpp:473:gst_nvinfer_server_start:<primary_gie> error: gstnvinferserver_impl start failed
** ERROR: main:716: Failed to set pipeline to PAUSED
Quitting
ERROR from primary_gie: Failed to initialize InferTrtIsContext
Debug info: gstnvinferserver_impl.cpp(547): start (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInferServer:primary_gie:
Config file path: /opt/nvidia/deepstream/deepstream-6.1/samples/configs/deepstream-app-triton-grpc/config_infer_plan_engine_primary.txt
ERROR from primary_gie: gstnvinferserver_impl start failed
Debug info: gstnvinferserver.cpp(473): gst_nvinfer_server_start (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInferServer:primary_gie
App run failed

• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Many Thanks!

yingliu · January 10, 2023, 6:56am

Why set “enable_cuda_buffer_sharing” as true? From the description it is used for local server:
Enable sharing of CUDA buffers with local Triton server for input tensors. If enabled, the input CUDA buffers are shared with the Triton server to improve performance. This feature should be enabled only when the Triton server is on the same machine..

feicccccccc · January 10, 2023, 7:27am

Thanks! I comment out the line and it works fine. But now I observe the framerate and GPU utilization is much slower now.

i.e.
source30_1080p_dec_infer-resnet_tiled_display_int8.txt example drop from 30fps to less than 1 fps. (different machine)

source30_1080p_dec_infer-resnet_tiled_display_int8.txt example drop from 30fps to less than 15 fps. (same machine)

Does Triton inference server intend to work on the same host running deepstream? It will be very hard to manage multiple streams with multiple triton inference servers.

Thanks!

yingliu · January 10, 2023, 7:36am

What type is your network card, 1G/2.5G or 10G?
You can use “iftop ethx” (replace ethx to your NIC name) to monitor the realtime bandwidth when the program is running.

feicccccccc · January 10, 2023, 8:43am

Thanks, I am using a 1G network, and the network is indeed the bottleneck.

yingliu · January 17, 2023, 4:05am

This is indeed bottleneck. Anything else we can help? We’ll close this topic if no support is needed.

feicccccccc · January 17, 2023, 4:29am

For separate machines, yes.

I tried to run the server and client on the same machine with the same settings to eliminate the network bottleneck (which could be solved by buying a better switch). With enable_cuda_buffer_sharing: true commented out, I noticed a very large drop in the frame for the configuration source30_1080p_dec_infer-resnet_tiled_display_int8.txt. (from ~30fps to ~15fps, and eventually lag out). The log from nvtop show ~1-3 GB/s transfer for both RX and TX.

Is it correct that if I disable enable_cuda_buffer_sharing, the raw image will need to transfer back to the host, and back to the GPU for inference, which significantly slows down the fps? A naive calculation shows the transfer takes around 2x5.6GB/s bandwidth, which is much lower than the theoretical limit of PCI-E gen 4.

The purpose is to test if I can separate the model part from the deepstream application for rapid change of the model without modifying the deep stream application.

fanzh · January 17, 2023, 8:21am

please find enable_cuda_buffer_sharing 's description in this link: Gst-nvinferserver — DeepStream 6.3 Release documentation
As the doc said, enable_cuda_buffer_sharing=true will improve performance when the Triton server is on the same machine.

feicccccccc · January 17, 2023, 8:26am

thanks for the reply. Yes, I understand that. I am exploring options to use it on different machines, which the bottleneck seems not related to the network (there’s 10GB and even 100GB network switch). I notice the performance may be due to the data transfer to the GPU memory and wonder if there’s a way/option to optimize it.

fanzh · January 17, 2023, 9:25am

when the server and client on the same machine, did you still meet “~30fps to ~15fps, and eventually lag out” issue if enable_cuda_buffer_sharing is true?

feicccccccc · January 17, 2023, 9:30am

No, it runs smoothly at around 26 fps. Just like using trt as the backend. The bottleneck becomes the speed of the decoder, which the utilisation stays at 100% all the time.

fanzh · January 17, 2023, 9:50am

There is no update from you for a period, assuming this is not an issue anymore.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

here are some option1 to improve performance.

use sample_720p.mp4 as source.
use nvinferserver 's interval property, please refer to Gst-nvinferserver — DeepStream 6.1.1 Release documentation.

system · February 13, 2023, 10:12am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Agx Orin - Triton inference DeepStream SDK deepstream	17	538	May 6, 2024
[error] when DeepsTream`s container using Triton Inference Server through gRPC,Segmentation fault (core dumped) DeepStream SDK	11	1101	March 9, 2022
Deepstream triton server config_infer.txt file DeepStream SDK	5	451	May 16, 2022
Triton and deepstream Triton Inference Server - archived	0	488	March 11, 2021
Avoid memory copy for deepstream pipeline connecting to a standalone local triton inference server DeepStream SDK docker , inference-server-triton , gpu , grpc , deepstream	2	396	April 1, 2024
Error when using Triton Server for Inference on deepstream-imagedata-example DeepStream SDK	21	1859	October 12, 2021
Deepstream with triton is stuck and not outputting anything DeepStream SDK inference-server-triton , inception	5	1041	September 19, 2022
Running ds-triton pipeline with DeepStream and Triton C Api, triton model inference is stuck and the frame rate of deepstream drops to 0 DeepStream SDK	10	623	January 3, 2023
DeepStream Container is unable to connect to Triton Inference Server Container through GRPC DeepStream SDK	7	821	April 26, 2022
How to add triton server to deepstream in different device? DeepStream SDK	10	828	September 7, 2023

DeepStream Triton gRPC example does not run with Deepstream Triton Docker images

Related topics