CUDA shared memory doesn't work (failed to open CUDA IPC handle: invalid device context)

• Hardware Platform (Jetson / GPU): GPU
• DeepStream Version: 7.1 (python)
• TensorRT Version: 10.3.0.26
• Docker Version: 27.5.0
• Docker Compose Version: 2.32.3
• Nvidia container toolkit Version: 1.17.3
• CUDA Version: 12.6
• NVIDIA GPU Driver Version (valid for GPU only): 555.58.02
• Issue Type( questions, new requirements, bugs): questions

Hello. I want to run inference in the nvinferserver module from deepstream using cuda memory sharing. The problem is on the startup side of the triton inference server. If I run it through the docker run command, then everything works.
But then when I want to run through docker compose, where I will transfer the same information and settings, an error occurs.
Can you tell me how to run triton inference server via docker compose?

docker run command:
docker run --gpus ‘“‘device=0’”’ -it --rm -v /home/adels/Downloads/test_model_deepstream:/opt/model_repo -e DISPLAY=$DISPLAY --net=host nvcr.io/nvidia/deepstream:7.1-triton-multiarch
docker-compose file:
services:
server-node-test:
image: nvcr.io/nvidia/deepstream:7.1-triton-multiarch
runtime: nvidia
environment:
- DISPLAY=${DISPLAY}
volumes:
- /home/adels/Downloads/test_model_deepstream:/opt/model_repo
network_mode: host
ipc: host
entrypoint: tritonserver --model-repository=/opt/model_repo

error:
deepstream:
INFO: TritonGrpcBackend id:1 initialized for model: PigsCountingServiceOnnx_master
deepstream-1 | ERROR: Failed to register CUDA shared memory.
deepstream-1 | ERROR: Failed to set inference input: failed to register shared memory region: invalid args
deepstream-1 | ERROR: gRPC backend run failed to create request for model: PigsCountingServiceOnnx_master
deepstream-1 | ERROR: failed to specify dims when running inference on model:PigsCountingServiceOnnx_master, nvinfer error:NVDSINFER_TRITON_ERROR
deepstream-1 | 0:00:05.201936404 1 0x583353f516c0 ERROR nvinferserver gstnvinferserver.cpp:405:gst_nvinfer_server_logger: nvinferserver[UID 1]: Error in specifyBackendDims() <infer_grpc_context.cpp:165> [UID = 1]: failed to specify input dims triton backend for model:PigsCountingServiceOnnx_master, nvinfer error:NVDSINFER_TRITON_ERROR
deepstream-1 | 0:00:05.201964270 1 0x583353f516c0 ERROR nvinferserver gstnvinferserver.cpp:405:gst_nvinfer_server_logger: nvinferserver[UID 1]: Error in createNNBackend() <infer_grpc_context.cpp:230> [UID = 1]: failed to specify triton backend input dims for model:PigsCountingServiceOnnx_master, nvinfer error:NVDSINFER_TRITON_ERROR
deepstream-1 | 0:00:05.201992694 1 0x583353f516c0 ERROR nvinferserver gstnvinferserver.cpp:405:gst_nvinfer_server_logger: nvinferserver[UID 1]: Error in initialize() <infer_base_context.cpp:80> [UID = 1]: create nn-backend failed, check config file settings, nvinfer error:NVDSINFER_TRITON_ERROR

triton:
E0304 16:56:54.192522 1 shared_memory_manager.cc:259] “failed to open CUDA IPC handle: invalid device context”

nvinferserver settings:
“backend”: {
“triton”: {
“grpc”: {
“enable_cuda_buffer_sharing”: true,
“url”: “0.0.0.0:8001”
},
“model_name”: “PigsCountingServiceOnnx_master”,
“version”: -1
}
},
model config:
name: “PigsCountingServiceOnnx_master”
backend: “onnxruntime”
default_model_filename: “model.onnx”
instance_group [
{
count: 1
kind: KIND_GPU
gpus: [ 0]
}
]

optimization {
execution_accelerators {
gpu_execution_accelerator : [{
name : “tensorrt”,
parameters { key: “trt_engine_cache_enable” value: “1” },
parameters { key: “trt_engine_cache_path” value: “/trt_cache” },
parameters { key: “max_workspace_size_bytes” value: “10000000000” },
parameters { key: “trt_builder_optimization_level” value: “3” },
parameters { key: “precision_mode” value: “FP16” },
}]
}
}

I tried using NVIDIA_VISIBLE_DEVICES=0,NVIDIA_DRIVER_CAPABILITIES=all, CUDA_VISIBLE_DEVICES=0 did not help

trtionserver failed to start.

  1. could you share the whole tritonsever running log?
  2. could you add -gpus=“device=0” in docker-compose file, then try again? Thanks!
  1. Triton server logs triton_logs.txt (9.4 KB)
  2. If I create a docker compose up --build --gpus “device=0", I will get the error “unknown flag: --gpus”
    If I use docker-compose file with deployment: resources: backup: devices: - features: [ gpu ] , then the error will be the same (“CUDA IPC descriptor could not be opened: invalid device context”).

could you try adding --gpus “device=0" to environment: part of docker-compose file? Thanks!

I tried environment: - gpus=‘“‘device=0’”’; environment: - gpus=‘device=0’; environment: - gpus=“‘device=0’” the error will be the same (“CUDA IPC descriptor could not be opened: invalid device context”).

To narrow down this issue, if commenting out entrypoint…, after building and starting a container, could you share the result of “nvidia-smi” and “tritonserver --model-repository=/opt/model_repo”? Thanks!

Hello. I found out the following. If I run docker compose without entrypoint (I just put [“/bin/bash”, “-c”, “sleep infinity”]), and then I call the command “tritonserver --model-repository=/opt/model_repo” inside the container, then cuda_sharing_memory will work. Any idea what this might be related to?

docker-compose.yaml
services:
server-node-test:
image: nvcr.io/nvidia/deepstream:7.1-triton-multiarch
runtime: nvidia
environment:
- DISPLAY=${DISPLAY}
volumes:
- /home/adels/Downloads/test_model_deepstream:/opt/model_repo
network_mode: host
ipc: host
entrypoint: [“/bin/bash”, “-c”, “sleep infinity”]

  1. I can’t reproduce this issue. please refer to my test.
    a. there is only one A6000 on my device. In DS7.1, created TRT engine for triton with /opt/nvidia/deepstream/deepstream-8.0/samples/prepare_ds_triton_model_repo.sh.
    b. In host, created docker-compose.yaml (396 Bytes). At the same directory, run “docker-compose up”.
    c. tritonserver run normally. here is the log tritonserver-0309.txt (9.3 KB).

  2. How many GPUs in your device? could you use DeepStrem native model Primary_Detector to test?