CUDA shared memory registration failed when requesting recognition from deepstream to an external triton server. to occur

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) : GPU
• DeepStream Version : 6.1
• JetPack Version (valid for Jetson only)
• TensorRT Version : 8.4.0.11
• NVIDIA GPU Driver Version (valid for GPU only) : 525.105.17
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

The execution environment is as follows

  1. triton server version is 23.10
  2. Deepstream sends an inference request to the triton server docker run separately.
  3. For deepstream’s config.pbtxt, set enable_cuda_buffer_sharing:true
  4. When deepstream makes one inference request to one GPU, it executes normally.
  5. When multiple deep stream inference requests are made in deep stream, an error like number 5 occurs, but over time, it stabilizes and multiple deep streams run normally.
  6. ERROR: infer_grpc_client.cpp:223 Failed to register CUDA shared memory.
    ERROR: infer_grpc_client.cpp:311 Failed to set inference input: failed to register CUDA shared memory region ‘inbuf_0x2be8300’: failed to open CUDA IPC handle: invalid argument
    ERROR: infer_grpc_backend.cpp:140 gRPC backend run failed to create request for model: yolov8_pose
    ERROR: infer_trtis_backend.cpp:350 failed to specify dims when running inference on model:yolov8_pose, nvinfer error:NVDSINFER_TRITON_ERROR
  7. I want to prevent 5 errors when making multiple inference requests.

• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

  1. I want to prevent 5 errors when making multiple inference requests.
  2. I can see from the document that enable_cuda_buffer_sharing:true is valid on the triton server in the deepstream docker container, but I confirmed that it operates normally over time even when running it on an external triton server. Please tell me how to prevent the above error from occurring.

This “enable_cuda_buffer_sharing” feature should be enabled only when the Triton server is on the same machine. are the client and tritonserver on the same machine? if yes, could you use deepstream-test1, which support nvinferserver, to reproduce this issue? Thanks!

Are you saying that you need to have a triton server inside the deepstream docker container when using deepstream docker?
If correct, as I wrote in the text, the triton server was run as a separate docker container.
As a result, a total of two docker containers, deepstream and triton server, are being used.
I know that the enable_cuda_buffer_sharing function works without problems when there is a triton server in the same deepstream.
However, as in my case, there was no triton server in the deepstream docker container, and when I ran the external triton server docker container, it took time to stabilize, but it worked in the end.
I’m getting ERROR: infer_grpc_client.cpp:223 Failed to register CUDA shared memory. I would like to resolve the error.

  1. Thanks for the sharing! seems the DeepSteram client and triton server are in the same machine but in the different docker container. I will try this “Failed to register CUDA shared” error.
  2. what do you mean about “makes one inference request to one GPU” and “When multiple deep stream inference requests are made in deep stream”? how to reproduce these steps?

please start the docker with “–ipc host”, please refer to this topic. On one machine, I started two deepstream:6.4-triton-multiarch as the client and server respectively. deepstream-app run well with enable_cuda_buffer_sharing=true.
if still encounter " Failed to register CUDA shared memory", please use deepstream-app to reproduce this issue and share the detailed reproducing steps.

I found a solution.
The way I proceeded was to run multiple deepstreams through multiprocess.
However, this method seems to have a problem because multiple processes with the same parent pid try to use shared memory at the same time.
After changing the part to subprocess and executing it, it operates as a separate process and the cuda shared memory error seems to have been resolved.
Thank you for your interest in the content.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.