CUDA shared memory doesn't work (failed to open CUDA IPC handle: invalid device context)

shi.samadel · March 4, 2025, 5:11pm

• Hardware Platform (Jetson / GPU): GPU
• DeepStream Version: 7.1 (python)
• TensorRT Version: 10.3.0.26
• Docker Version: 27.5.0
• Docker Compose Version: 2.32.3
• Nvidia container toolkit Version: 1.17.3
• CUDA Version: 12.6
• NVIDIA GPU Driver Version (valid for GPU only): 555.58.02
• Issue Type( questions, new requirements, bugs): questions

Hello. I want to run inference in the nvinferserver module from deepstream using cuda memory sharing. The problem is on the startup side of the triton inference server. If I run it through the docker run command, then everything works.
But then when I want to run through docker compose, where I will transfer the same information and settings, an error occurs.
Can you tell me how to run triton inference server via docker compose?

docker run command:
docker run --gpus ‘“‘device=0’”’ -it --rm -v /home/adels/Downloads/test_model_deepstream:/opt/model_repo -e DISPLAY=$DISPLAY --net=host nvcr.io/nvidia/deepstream:7.1-triton-multiarch
docker-compose file:
services:
server-node-test:
image: nvcr.io/nvidia/deepstream:7.1-triton-multiarch
runtime: nvidia
environment:
- DISPLAY=${DISPLAY}
volumes:
- /home/adels/Downloads/test_model_deepstream:/opt/model_repo
network_mode: host
ipc: host
entrypoint: tritonserver --model-repository=/opt/model_repo

error:
deepstream:
INFO: TritonGrpcBackend id:1 initialized for model: PigsCountingServiceOnnx_master
deepstream-1 | ERROR: Failed to register CUDA shared memory.
deepstream-1 | ERROR: Failed to set inference input: failed to register shared memory region: invalid args
deepstream-1 | ERROR: gRPC backend run failed to create request for model: PigsCountingServiceOnnx_master
deepstream-1 | ERROR: failed to specify dims when running inference on model:PigsCountingServiceOnnx_master, nvinfer error:NVDSINFER_TRITON_ERROR
deepstream-1 | 0:00:05.201936404 1 0x583353f516c0 ERROR nvinferserver gstnvinferserver.cpp:405:gst_nvinfer_server_logger: nvinferserver[UID 1]: Error in specifyBackendDims() <infer_grpc_context.cpp:165> [UID = 1]: failed to specify input dims triton backend for model:PigsCountingServiceOnnx_master, nvinfer error:NVDSINFER_TRITON_ERROR
deepstream-1 | 0:00:05.201964270 1 0x583353f516c0 ERROR nvinferserver gstnvinferserver.cpp:405:gst_nvinfer_server_logger: nvinferserver[UID 1]: Error in createNNBackend() <infer_grpc_context.cpp:230> [UID = 1]: failed to specify triton backend input dims for model:PigsCountingServiceOnnx_master, nvinfer error:NVDSINFER_TRITON_ERROR
deepstream-1 | 0:00:05.201992694 1 0x583353f516c0 ERROR nvinferserver gstnvinferserver.cpp:405:gst_nvinfer_server_logger: nvinferserver[UID 1]: Error in initialize() <infer_base_context.cpp:80> [UID = 1]: create nn-backend failed, check config file settings, nvinfer error:NVDSINFER_TRITON_ERROR

triton:
E0304 16:56:54.192522 1 shared_memory_manager.cc:259] “failed to open CUDA IPC handle: invalid device context”

nvinferserver settings:
“backend”: {
“triton”: {
“grpc”: {
“enable_cuda_buffer_sharing”: true,
“url”: “0.0.0.0:8001”
},
“model_name”: “PigsCountingServiceOnnx_master”,
“version”: -1
}
},
model config:
name: “PigsCountingServiceOnnx_master”
backend: “onnxruntime”
default_model_filename: “model.onnx”
instance_group [
{
count: 1
kind: KIND_GPU
gpus: [ 0]
}
]

optimization {
execution_accelerators {
gpu_execution_accelerator : [{
name : “tensorrt”,
parameters { key: “trt_engine_cache_enable” value: “1” },
parameters { key: “trt_engine_cache_path” value: “/trt_cache” },
parameters { key: “max_workspace_size_bytes” value: “10000000000” },
parameters { key: “trt_builder_optimization_level” value: “3” },
parameters { key: “precision_mode” value: “FP16” },
}]
}
}

I tried using NVIDIA_VISIBLE_DEVICES=0,NVIDIA_DRIVER_CAPABILITIES=all, CUDA_VISIBLE_DEVICES=0 did not help

fanzh · March 5, 2025, 7:40am

trtionserver failed to start.

could you share the whole tritonsever running log?
could you add -gpus=“device=0” in docker-compose file, then try again? Thanks!

shi.samadel · March 5, 2025, 8:37am

Triton server logs triton_logs.txt (9.4 KB)
If I create a docker compose up --build --gpus “device=0", I will get the error “unknown flag: --gpus”
If I use docker-compose file with deployment: resources: backup: devices: - features: [ gpu ] , then the error will be the same (“CUDA IPC descriptor could not be opened: invalid device context”).

fanzh · March 5, 2025, 9:19am

could you try adding --gpus “device=0" to environment: part of docker-compose file? Thanks!

shi.samadel · March 5, 2025, 10:05am

I tried environment: - gpus=‘“‘device=0’”’; environment: - gpus=‘device=0’; environment: - gpus=“‘device=0’” the error will be the same (“CUDA IPC descriptor could not be opened: invalid device context”).

fanzh · March 6, 2025, 3:25pm

To narrow down this issue, if commenting out entrypoint…, after building and starting a container, could you share the result of “nvidia-smi” and “tritonserver --model-repository=/opt/model_repo”? Thanks!

shi.samadel · March 7, 2025, 7:01am

Hello. I found out the following. If I run docker compose without entrypoint (I just put [“/bin/bash”, “-c”, “sleep infinity”]), and then I call the command “tritonserver --model-repository=/opt/model_repo” inside the container, then cuda_sharing_memory will work. Any idea what this might be related to?

docker-compose.yaml
services:
server-node-test:
image: nvcr.io/nvidia/deepstream:7.1-triton-multiarch
runtime: nvidia
environment:
- DISPLAY=${DISPLAY}
volumes:
- /home/adels/Downloads/test_model_deepstream:/opt/model_repo
network_mode: host
ipc: host
entrypoint: [“/bin/bash”, “-c”, “sleep infinity”]

fanzh · March 9, 2025, 3:19pm

I can’t reproduce this issue. please refer to my test.
a. there is only one A6000 on my device. In DS7.1, created TRT engine for triton with /opt/nvidia/deepstream/deepstream-8.0/samples/prepare_ds_triton_model_repo.sh.
b. In host, created docker-compose.yaml (396 Bytes). At the same directory, run “docker-compose up”.
c. tritonserver run normally. here is the log tritonserver-0309.txt (9.3 KB).
How many GPUs in your device? could you use DeepStrem native model Primary_Detector to test?

yingliu · April 14, 2025, 10:52am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

system · April 28, 2025, 10:52am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Docker fails to register cuda shared memory TensorRT tensorrt , cuda , docker	3	1160	July 14, 2022
Deepstream - Failed to register CUDA shared memory DeepStream SDK	3	363	December 25, 2023
DeepStream Triton gRPC example does not run with Deepstream Triton Docker images DeepStream SDK	12	1326	January 17, 2023
CUDA shared memory registration failed when requesting recognition from deepstream to an external triton server. to occur DeepStream SDK	6	580	April 23, 2024
Error when using Triton Server for Inference on deepstream-imagedata-example DeepStream SDK	21	1983	October 12, 2021
Docker:: DeepStream Triton Inference Server, deepstream-test1 sample app error : cuda_runtime_api.h: No such file or directory CUDA Developer Tools	2	2007	June 25, 2021
Triton server logs DeepStream SDK	7	5451	May 16, 2022
Deepstream-app 5.1 triton container not initializing on Tesla V100 DeepStream SDK	2	395	October 12, 2021
deepstream for tesla in docker DeepStream SDK	8	1825	June 5, 2018
TensorRT Inference Server run from docker-compose does not find --model-store Triton Inference Server (archived)	2	2943	July 31, 2019

CUDA shared memory doesn't work (failed to open CUDA IPC handle: invalid device context)

Related topics