• Hardware Platform (Jetson / GPU): GPU
• DeepStream Version: 7.1 (python)
• TensorRT Version: 10.3.0.26
• Docker Version: 27.5.0
• Docker Compose Version: 2.32.3
• Nvidia container toolkit Version: 1.17.3
• CUDA Version: 12.6
• NVIDIA GPU Driver Version (valid for GPU only): 555.58.02
• Issue Type( questions, new requirements, bugs): questions
Hello. I want to run inference in the nvinferserver module from deepstream using cuda memory sharing. The problem is on the startup side of the triton inference server. If I run it through the docker run command, then everything works.
But then when I want to run through docker compose, where I will transfer the same information and settings, an error occurs.
Can you tell me how to run triton inference server via docker compose?
docker run command:
docker run --gpus ‘“‘device=0’”’ -it --rm -v /home/adels/Downloads/test_model_deepstream:/opt/model_repo -e DISPLAY=$DISPLAY --net=host nvcr.io/nvidia/deepstream:7.1-triton-multiarch
docker-compose file:
services:
server-node-test:
image: nvcr.io/nvidia/deepstream:7.1-triton-multiarch
runtime: nvidia
environment:
- DISPLAY=${DISPLAY}
volumes:
- /home/adels/Downloads/test_model_deepstream:/opt/model_repo
network_mode: host
ipc: host
entrypoint: tritonserver --model-repository=/opt/model_repo
error:
deepstream:
INFO: TritonGrpcBackend id:1 initialized for model: PigsCountingServiceOnnx_master
deepstream-1 | ERROR: Failed to register CUDA shared memory.
deepstream-1 | ERROR: Failed to set inference input: failed to register shared memory region: invalid args
deepstream-1 | ERROR: gRPC backend run failed to create request for model: PigsCountingServiceOnnx_master
deepstream-1 | ERROR: failed to specify dims when running inference on model:PigsCountingServiceOnnx_master, nvinfer error:NVDSINFER_TRITON_ERROR
deepstream-1 | 0:00:05.201936404 1 0x583353f516c0 ERROR nvinferserver gstnvinferserver.cpp:405:gst_nvinfer_server_logger: nvinferserver[UID 1]: Error in specifyBackendDims() <infer_grpc_context.cpp:165> [UID = 1]: failed to specify input dims triton backend for model:PigsCountingServiceOnnx_master, nvinfer error:NVDSINFER_TRITON_ERROR
deepstream-1 | 0:00:05.201964270 1 0x583353f516c0 ERROR nvinferserver gstnvinferserver.cpp:405:gst_nvinfer_server_logger: nvinferserver[UID 1]: Error in createNNBackend() <infer_grpc_context.cpp:230> [UID = 1]: failed to specify triton backend input dims for model:PigsCountingServiceOnnx_master, nvinfer error:NVDSINFER_TRITON_ERROR
deepstream-1 | 0:00:05.201992694 1 0x583353f516c0 ERROR nvinferserver gstnvinferserver.cpp:405:gst_nvinfer_server_logger: nvinferserver[UID 1]: Error in initialize() <infer_base_context.cpp:80> [UID = 1]: create nn-backend failed, check config file settings, nvinfer error:NVDSINFER_TRITON_ERROR
triton:
E0304 16:56:54.192522 1 shared_memory_manager.cc:259] “failed to open CUDA IPC handle: invalid device context”
nvinferserver settings:
“backend”: {
“triton”: {
“grpc”: {
“enable_cuda_buffer_sharing”: true,
“url”: “0.0.0.0:8001”
},
“model_name”: “PigsCountingServiceOnnx_master”,
“version”: -1
}
},
model config:
name: “PigsCountingServiceOnnx_master”
backend: “onnxruntime”
default_model_filename: “model.onnx”
instance_group [
{
count: 1
kind: KIND_GPU
gpus: [ 0]
}
]
optimization {
execution_accelerators {
gpu_execution_accelerator : [{
name : “tensorrt”,
parameters { key: “trt_engine_cache_enable” value: “1” },
parameters { key: “trt_engine_cache_path” value: “/trt_cache” },
parameters { key: “max_workspace_size_bytes” value: “10000000000” },
parameters { key: “trt_builder_optimization_level” value: “3” },
parameters { key: “precision_mode” value: “FP16” },
}]
}
}
I tried using NVIDIA_VISIBLE_DEVICES=0,NVIDIA_DRIVER_CAPABILITIES=all, CUDA_VISIBLE_DEVICES=0 did not help